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Abstract 

In this work, we consider a discrete-time stationary Rayleigh flat-fading channel with unknown channel state 
information at transmitter and receiver side. The law of the channel is presumed to be known to the receiver. In 
addition, we assume the power spectral density of the fading process to be compactly supported. For independent 
identically distributed (i.i.d.) zero-mean proper Gaussian input distributions, we investigate the achievable rate. One 
^ of the main contributions of the present paper is the derivation of two new upper bounds on the achievable rate 

with zero-mean proper Gaussian input symbols. The first one holds only for the special case of a rectangular power 
spectral density and depends on the SNR and the spread of the power spectral density. Together with a lower bound 
on the achievable rate, which is achievable with i.i.d. zero-mean proper Gaussian input symbols, we have found a 
set of bounds which is tight in the sense that their difference is bounded. Furthermore, we show that the high SNR 
slope is characterized by a pre-log of 1 — 2fd, where fd is the normalized maximum Doppler frequency. This pre-log 
is equal to the high SNR pre-log of the peak power constrained capacity. Furthermore, we derive an alternative 
upper bound on the achievable rate with i.i.d. input symbols which is based on the one-step channel prediction 
error variance. The novelty lies in the fact that this bound is not restricted to peak power constrained input symbols 
like known bounds, e.g., in (TJ. Therefore, the derived upper bound can also be used to evaluate the achievable 
rate with i.i.d. proper Gaussian input symbols. In addition, we compare the derived bounds on the achievable rate 
with i.i.d. zero-mean proper Gaussian input symbols with bounds on the peak power constrained capacity given in 
IH, O, and 0, Finally, we compare the achievable rate with i.i.d. zero-mean proper Gaussian input symbols with 
the achievable rate using synchronized detection in combination with a solely pilot based channel estimation. 
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I. INTRODUCTION 
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"N this paper, we consider a stationary Rayleigh flat-fading channel with temporal correlation. We 
-assume that the channel state information is unknown to the transmitter and the receiver, while the 
03 • receiver is aware of the channel law. The capacity of this scenario is particularly important, as it applies to 
many realistic mobile communication systems. In order to acquire channel state information, the temporal 
correlation of the channel can be exploited by the system, e.g., by inserting training sequences at transmit 
side. While these training sequences can be understood as a specific type of code [3], we are interested 
in the achievable rate on this channel irrespective of the use of training sequences. 

The capacity of fading channels where the channel state information is unknown, i.e., sometimes referred 
to as noncoherent capacity, has received a lot of attention in the recent literature. E.g., [@] considers a 
block fading channel model, where the channel is assumed to be constant over a block of N symbols 
and changes independently from block to block. This model is non- stationary and, therefore, different 
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from the one we consider in the present work. On the other hand, in and [|6l the achievable rate of 
time-continuous fading channels has been examined under the assumption of the use of training sequences 
for channel tracking and a coherent detection based on the acquired channel estimate. Furthermore, in 
the asymptotic high SNR capacity of a stationary Gaussian fading channel has been investigated, 
whereas in an approximate behavior of the capacity for different SNR regimes has been considered. 
In addition, in [|TJ bounds on the capacity for temporally correlated Rayleigh fading channels with a peak 
power constraint have been derived with specific emphasis on the low SNR regime. Furthermore, the case 
of frequency selective stationary fading channels has been discussed, e.g., in j8]| and J9]|. 

The main goal of the present work is the investigation of the achievable rate with independent identically 
distributed (i.i.d.) zero-mean proper Gaussian input symbols on noncoherent stationary discrete-time 
Rayleigh flat-fading channels. On the one hand, i.i.d. zero-mean proper Gaussian input symbols are 
capacity achieving in case the channel state is perfectly known at the receiver. Even though they are not 
capacity achieving for the given noncoherent scenario [10], the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols is highly interesting, as in many cases the capacity-achieving input distribution 
becomes peaky and, thus, impractical for real system design. In contrast, i.i.d. zero-mean proper Gaussian 
input distributions serve well to upper-bound the achievable rate with practical modulation and coding 
schemes, see also [fTTTl and |[T2ll . In IfTTI the achievable rate with i.i.d. Gaussian inputs has been studied 
for the block fading channel. Furthermore, in [[13] bounds on the mutual information with Gaussian input 
distributions have been derived for a Gauss-Markov fading channel, whose PSD has an unbounded support. 
The results in |fT3~l indicate that at moderate SNR and/or slow fading, Gaussian inputs still work well. In 
contrast to these publications, in the present work we study the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols for the case of a discrete-time stationary Rayleigh flat-fading channel, where the 
power spectral density (PSD) of the channel fading process is characterized by a compact support with 
a normalized maximum Doppler frequency f d < 0.5, i.e., nonregular fading [|T4l . as it, e.g., corresponds 
to the widely used Jakes' model [[T5l . 

A. Contributions 

Within the present work, we consider a discrete-time stationary Rayleigh flat-fading channel with a 
compactly supported PSD. The channel fading process is assumed to be nonregular. Furthermore, it is 
assumed that the channel state information is unknown to the transmitter and the receiver, while the 
receiver is aware of the channel law. In this context we obtain the following. 

In Section [Till we give an upper bound on the achievable rate with i.i.d. zero-mean proper Gaussian 
input symbols for the special case of a rectangular PSD, depending on the SNR and the spread of the PSD. 
Especially, the therefor used lower bound on the conditional output entropy rate ft/(y|x) for a rectangular 
PSD is, to the best of our knowledge, new. The particularity of the given lower bound on h'(y\x) lies in 
the fact that its derivation is not based on a peak power constraint, enabling its evaluation for Gaussian 
input symbols. The assumption of a rectangular PSD is usually made in typical communication system 
design. For comparison, we give a lower bound on the achievable rate, holding for an arbitrary PSD 
with compact support, which is already known from [fl6l . With this lower and this upper bound on the 
achievable rate with i.i.d. zero-mean proper Gaussian inputs, we have found a set of bounds which is 
tight in the sense that its difference is bounded by (1 + 2f d )^ [nat/channel use] for all SNRs, where 
7 ~ 0.57721 is the Euler constant and fd is the normalized maximum Doppler frequency. Furthermore, 
in Section IIII-Gl we discuss the relation of the bounds on the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols to bounds on the peak power constrained capacity given in [0 and [Q]|. 

We show, in Section IIII-El that the asymptotic high SNR slope ipre-log) of the achievable rate with 
i.i.d. zero-mean proper Gaussian input symbols is given by 1 — 2f d . This exactly corresponds to the high 
SNR behavior of the peak power constrained capacity as discussed in Q. Furthermore, in Section IIII-F1 
we compare the bounds on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols to 
the high SNR asymptotes on the peak power constrained capacity given in ||3). 
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Additionally, in Section [IV] we derive an alternative upper bound on the achievable rate with i.i.d. input 
symbols. This upper bound, whose derivation relies on the assumption on i.i.d. input symbols, is based on 
the one-step channel prediction error variance, and, thus, is linked to a physical interpretation. Compared 
to this, the bounds given in Section [III] are based on a purely mathematical derivation. There exist already 
bounds on the capacity with peak power constrained input distributions, which are based on the one-step 
channel prediction error variance, see, e.g., [3 1. However, for the derivation of the channel prediction based 
capacity bounds in [|3]], the peak power constraint has been required for technical reasons. Differently, 
our new upper bound based on the channel prediction error variance is not restricted to peak power 
constrained input symbols, enabling its evaluation for Gaussian inputs. However, due to the restriction 
to i.i.d. input symbols, we only get an upper bound on the achievable rate and not on the capacity. We 
evaluate this upper bound on the achievable rate with i.i.d. input symbols, on the one hand, for zero-mean 
proper Gaussian input symbols. In contrast to the upper bound on the achievable rate with i.i.d. zero- 
mean proper Gaussian input symbols derived in Section UTU which holds only for a rectangular PSD of 
the fading process, the upper bound based on the channel prediction error variance holds for an arbitrary 
PSD with compact support. 

On the other hand, and this is out of the main focus of the present paper, we also evaluate the upper 
bound on the achievable rate with i.i.d. input symbols based on the channel prediction error variance for 
peak power constrained input symbols, and compare this new upper bound to capacity bounds given in 

CQ. 

Finally, we compare the achievable rate with i.i.d. zero-mean proper Gaussian input symbols to the 
achievable rate with synchronized detection and a solely pilot based channel estimation. Using synchro- 
nized detection with a solely pilot based channel estimation, the channel is estimated solely based on 
pilot symbols and then, in a second separate step, the channel estimate is used for coherent detection. 
Thus, this comparison shows, how far such systems stay below the achievable rate with Gaussian code 
books, using no pilot symbols. This comparison might give an indication of the possible gain of advanced 
receivers using a joint processing of pilot and data symbols, in comparison to the separate processing. 
Such an instance of a joint processing of pilots and data symbols is, e.g., the approach of iterative code- 
aided channel estimation and decoding ifTTl . where the channel estimation is iteratively enhanced based 
on reliability information on the data symbols delivered by the decoder. The enhanced channel estimation, 
then in a further iteration allows for an enhanced decoding, and so on. 

The rest of this paper is organized as follows. In Section UH we introduce the channel model including 
a discussion of its limitations. Section [In] presents the derivation of the bounds on the achievable rate with 
i.i.d. zero-mean proper Gaussian input symbols, which are based on a purely mathematical derivation. 
This includes the discussion of their tightness, the evaluation of the high SNR behavior, the comparison to 
the high SNR asymptotes for the capacity given in (31, and the discussion of their relation to the bounds 
on the peak power constrained capacity given in O and [OQ. Afterwards, in Section [IV] an upper bound on 
the achievable rate with i.i.d. input symbols based on the channel prediction error variance is derived and 
evaluated for proper Gaussian inputs and for peak power constrained inputs. Subsequently, in Section [V] 
the achievable rate with i.i.d. zero-mean proper Gaussian input symbols is compared to the achievable 
rate with synchronized detection and a solely pilot based channel estimation, before we give a conclusion 
in Section ED 

II. System Model 

We consider an ergodic discrete-time jointly proper Gaussian [fT8l flat-fading channel, whose output at 
time k is given by 

Vk = h k -x k + n k (1) 

where x k G C is the complex-valued channel input, h k 6 C represents the channel fading coefficient, and 
n k E C is additive white Gaussian noise. The processes {h k }, {x k }, and {n k } are assumed to be mutually 
independent. 
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We assume that the noise {n k } is a sequence of i.i.d. proper Gaussian random variables of zero-mean 
and variance o\. The stationary channel fading process {h k } is zero-mean jointly proper Gaussian. In 
addition, the fading process is time-selective and characterized by its autocorrelation function 

r h (l)=E[h k+r h* k }. (2) 

Its variance is given by r h (0) = o\. 

The normalized PSD of the channel fading process is defined by 

oo 

Sh(f) = r h(l)e- jMf , |/| < 0.5 (3) 

/=— oo 

where we assume that the PSD exists and use the definition j = \f—\. Here, the frequency / is normalized 
with respect to the symbol duration T Sym . In the following, we use this normalized PSD and, thus, refer 
to it as PSD for simplification. For a jointly proper Gaussian process, the existence of the PSD implies 
ergodicity [fP9l . As the channel fading process {hk} is assumed to be stationary, Sh(f) is real-valued. 
Because of the limitation of the velocity of the transmitter, the receiver, and of objects in the environment, 
the spread of the PSD is limited, and we assume it to be compactly supported within the interval [—fd, fd], 
with < fd < 0.5, i.e., S h (f) = for / ^ [— fd,fd\- The parameter f d corresponds to the normalized 
maximum Doppler shift and, thus, indicates the dynamics of the channel. To ensure ergodicity, we exclude 
the case f d = 0. Following the definition given in |fl4l . this fading channel is sometimes referred to as 
nonregular. 

For technical reasons, in some of the proofs, i.e., for the calculation of the upper bound on the achievable 
data rate in Section [nil we restrict to autocorrelation functions r/ l (Z) which are absolutely summable, i.e., 

oo 

X> ft (Z)|<oo (4) 

/=— oo 

instead of the more general class of square summable autocorrelation functions, i.e., 

oo 

\r h (l)\ 2 <oo. (5) 

/=— oo 

The assumption of absolutely summable autocorrelation functions is not a severe restriction. E.g., the 
important rectangular PSD, see below in ©, can be arbitrarily closely approximated by a PSD with the 
shape corresponding to the transfer function of a raised cosine filter, whose autocorrelation function is 
absolutely summable, see Appendix [A] Therefore, in the rest of this work, we often evaluate the derived 
bounds on the achievable rate for a rectangular PSD of the channel fading process, although some of the 
derivations are based on the assumption of an absolutely summable autocorrelation function. 

A common model concerning the temporal correlation of the channel fading process Sh(f) is the Jakes' 
model [fT5l , for which the corresponding PSD of the discrete-time fading process Sh(f) in © is given 
by 



Jakes = \ W V f *- f ■ ( 6 ) 

I for fd < |/| < 0.5 

This PSD can be derived analytically for a dense scatterer environment with a vertical receive antenna 
and a constant azimuthal gain, a uniform distribution of signals arriving at all angles with phases being 
independently distributed over all angles, i.e., in the interval [0,27r), based on a sum of sinusoids |fT51 . 
Often the Jakes' PSD in © is approximated by the following rectangular PSD 

for f d < |/| < 0.5 
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For the derivation of the upper bound on the achievable rate in Section [TIT] we restrict to rectangular PSDs 
for mathematical tractability. 

Typical fading channels, as they are observed in mobile communication environments, are characterized 
by relatively small normalized Doppler frequencies fd in the regime of fd <C 0.1. Therefore, the restriction 
to channels with fd < 0.5, i.e., nonregular fading, in the present work is reasonable. Although in practical 
scenarios the observed channel dynamics are very small, within this work, we consider the range of 
< fd < 0.5 to get a thorough understanding of the behavior of the bounds on the achievable rate. 

A. Matrix-Vector Notation 

We base the derivation of bounds on the achievable rate on the following matrix-vector notation of the 
system model: 

y = Hx + n = Xh + n (8) 

where the vectors x is defined as 

x = [xi, . . .,x N } T . (9) 

The vectors y and n are defined analogously. The matrix H is diagonal and defined as H = diag(h) with 
h = [hi, . . . , h]y] T . Here the diag(-) operator generates a diagonal matrix whose diagonal elements are 
given by the argument vector. The diagonal matrix X is given by X = diag(x). The quantity N is the 
number of considered symbols. Later on, we investigate the case of iV — > oo to evaluate the achievable 
rate. 

Using this vector notation, we express the temporal correlation of the fading process by the correlation 
matrix 

R h = E[hh H ] (10) 

which has a Hermitian Toeplitz structure. 

Concerning the input distribution, unless otherwise stated, we make the assumption that the symbols 
Xk are i.i.d. zero-mean proper Gaussian distributed with an average power Thus, the average SNR is 
given by3 

P=°^. (ID 



B. Limitations of the Symbol Rate Discrete-Time Model 

To discuss the limitations of the symbol rate discrete-time model given in we start from the 
underlying appropriately bandlimited continuous-time model, where the channel output is given by 

y(t) = h(t)-s(t)+n(t) (12) 

with h(t) being the continuous-time channel fading process, i.e., the corresponding discrete-time process 
h k is given by 

h k = h(kT Sym ) (13) 

'Remark: Only in case there is no peak power constraint, as, e.g., in the case of Gaussian input symbols, the average SNR p is in 
general equal to the actual average SNR. In contrast, in case of an additional peak power constraint the achievable rate is in general not 
maximized by using the maximum average transmit power a*. Thus, in this case p does not necessarily correspond to the actual average 
SNR and, therefore, we name it nominal average SNR when considering a peak power constraint. For further discussion see Section IIII-FI 
and Section IhTgTI 
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where T Sym is the symbol duration. Analogously, the continuous-time and the discrete-time additive noise 
and channel output processes are related by 



n k = n(kT Sym ) 
y k = y(kT Sym ) 

The continuous-time transmit process s(t) is given by 



s(t)= ^2 x k- g{t- kT Sym ) 



(14) 
(15) 



(16) 



where g(t) is the transmit pulse. We assume the use of bandlimited transmit pulses, which, therefore, 
have an infinite impulse response. In typical systems, often root-raised cosine pulses are used such that in 
combination with the matched filter at the receiver intersymbol interference is minimized. Their normalized 
frequency response G(f) is given by 

G(f) = VG^if) (17) 
with G RC (f) being the transfer function of the raised cosine filter 

T Sym ft* |/| < 



2 



G RC (f)={ % 1 + cos (M|/| 



l-A-o 

2 







for ±^ < \f\<±+& 



otherwise 



(18) 



where < f3 m < 1 is the roll-off factor. 

The continuous-time input/output relation in ([TIT ) has the following stochastic representation in frequency 
domain 



S y (f) = S h (f) * S s (f) + S n (f) 



(19) 



where * denotes convolution and S y (f), Sh(f), S s (f), and S n (f) are the normalized power spectral 
densities of the continuous-time processes y(t), h(t), s(t), and n(t), e.g., 



Ss(f) 



E[s(t 



T)S 



: (t)]e- j27TfT dr 



(20) 



and correspondingly for the other PSDs. Here, we always assume normalization with l/T Sym . 

We are interested in the normalized bandwidth of the component Sh{f) * S s (f), i.e., the component 
containing information on the transmitted sequence {x k }. The normalized bandwidth of the transmit 
signal s(t) directly corresponds to the normalized bandwidth of the transmit pulse g{t), which is at least 
1 (corresponding to f3 m = 0). The normalized bandwidth of the channel fading process is given by 2/^. 
Thus, the normalized bandwidth of the component Sh{f) * S s (f) is given by at least 1 + 2/^. To get a 
sufficient statistic, we would have to sample the channel output y(t) at least with a frequency of ^ 2 ^ d 
(for f3 m = 0). As the discrete-time channel output process {y k } is a sampled version of y{t) with the 
rate l/T Sym , the discrete-time observation process {y k } is not a sufficient statistic of y(t). This shows the 
limitations of the symbol rate discrete-time system model in (0Q). As in typical systems the normalized 
maximum Doppler frequency fd is small in comparison to the symbol rate 1 /T Sym , the amount of discarded 
information is negligible. Besides this, in typical systems channel estimation is also performed at symbol 
rate and, therefore, also exhibit the loss due to the lack of a sufficient statistic. In addition, the majority of 
the current literature on the study of the capacity of stationary Rayleigh fading channels, e.g., or [UJ, 
is based on symbol rate discrete-time input-output relations and therefore do not ask the question about a 
sufficient statistic. Nevertheless, these considerations should be kept in mind in the following, especially, 
as we examine the derived bounds not only for very small values of fd- 
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III. Bounds on the Achievable Rate 
The information theoretic capacity is defined by 

C — lim sup-^-X(y;x) [nat/cu] (21) 

N— >oo -p N 

where X(y; x) is the mutual information between x and y, where cu is an abbreviation for channel use, 
and where the supremum is taken over the set V containing all input distributions with an average power 
constrained to a 2 x , i.e., 

V = |p(x) x G C", [x*x] < al | . (22) 

As the PSD of the fading process in © is assumed to exist, and as the channel fading process is jointly 
proper Gaussian, the channel fading process is ergodic. Therefore, the information theoretic capacity given 
in (12"TT) and the operational capacity coincide [fP9l , i.e., for each rate R < C there exist a code for which 
the probability of an erroneously decoded codeword approaches zero in the limit of an infinite codeword 
length. 

As it has already been discussed, the main focus of the present paper is not the discussion of the 
capacity but of the achievable rate with i.i.d. zero-mean proper Gaussian input symbols. As this kind of 
input distribution is in general not capacity achieving, we use the term achievable rate R, which then 
directly corresponds to the mutual information rate Z'(y;x), i.e., 

i? = X(y;x)= lim i-X(y; x) (23) 
where, as described in Section IH the elements of x are i.i.d. zero-mean proper Gaussian distributed. 

A. The Mutual Information Rate X'(y; x) 

In general, by means of the chain rule, the mutual information rate in (1231 can be expanded as 11201 

X(y; x) = X(y; x|h) - X'(x; h|y) (24) 

where X'(y; x|h) is the mutual information rate in case the channel is known at the receiver, i.e., the mutual 
information rate of the coherent channel, and X'(x; h|y) is the penalty due to the channel uncertainty. It 
is interesting to note that the penalty term can be further separated as follows: 

r(x;h|y)^r(y,x;h)-:r(y;h) 

®X'(y;h|x)+X'(h;x)-X'(y;h) 

^X'(y;h|x)-X'(y;h) (25) 

where for (a) and (b) we use the chain rule for mutual information and for (c) we exploit the fact that 
the mutual information between the channel fading process described by h and the input sequence x is 
zero due to the independency of h and x and, thus, 

X(h;x) = 0. (26) 

Obviously, with (1251) the penalty term corresponds to the difference between the knowledge on the channel 
h that can be obtained from the observation y while knowing the transmit sequence x in comparison to 
not knowing it. 

However, the derivation of the bounds on the mutual information rate X'(y;x) in the present work is 
based on the following straightforward separation of X'(y;x) into the differential entropy rates 

2'(y;x) = /i'(y)-/i'(y|x). (27) 
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where h'(-) denotes the differential entropy rate 

h'(-) = lim (28) 

N— >oo iV 

In Section UlI-BL we give a lower and an upper bound on the channel output entropy rate h'(y), which 
are independent of the PSD of the channel fading process Sh(f)- In Section IlII-Cl we derive an upper 
bound and the lower bound on h'(y\x). The upper bound, which is already known from |[T6l . holds for 
an arbitrary PSD of the channel fading process with compact support. For the lower bound on h'(y\x) 
we find a closed form expression only for the special case of a rectangular PSD. 



B. The Received Signal Entropy Rate h'(y) 

1 ) Lower Bound on h'(y); The mutual information with perfect channel state information at the receiver 
can be upper-bounded by 

Z(y;x|h) = /i(y|h)-/i(y|h,x) 

<h{y)-h(y\h,x). (29) 

Here, we make use of the fact that conditioning reduces entropy. Thus, we can lower-bound the entropy 
rate h'(y) by 



/ i '(y)>X / (y;x|h) + / i '( y |h,x) 



(30) 



The mutual information rate in case the channel is known at the receiver, i.e., the first term on the RHS 
of (1301) . is given by 



X / (y;x|h) = -E h 



(a) 



N 



E 



E, 



l(y;x\h) 



log 
log 



p(y|h,x) 

p(y|h) 

p(yk\h k} x k 
p(yk\h k ) 



hk 



(b) 



log 1 + p- 



\hi 



/■oo 

/ log (1 + pz) e~ z dz 

Jz=0 



(3D 



where (a) is based on the fact that due to conditioning on the channel fading vector h the channel uses 
become independent and we furthermore assume i.i.d. input symbolsJl Therefore, we can drop the time 
index for ease of notation. Finally, (b) holds for i.i.d. zero-mean proper Gaussian inputs which are capacity 
achieving in the coherent case. Thus, the RHS of (f3TT ) is the capacity of the coherent channel. Obviously, 
the coherent capacity is independent of the temporal correlation of the channel, see, e.g., (H. 
The second term on the RHS of (1301) originates from AWGN and, thus, can be calculated as 

(32) 



h'(y\h,x.) = log (neal) 
Hence, a lower bound on the entropy rate h'(y) is given by 



POO 

h'(y) > h' L (y) = / log [vre (o* + a 2 x a 2 h z)] e~ z dz. 

Jz=0 



(33) 



2 A11 logarithms in this paper are to the base e and, unless stated otherwise, all rates are in nat. 
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2) Upper Bound on h'(y): In this section, we give an upper bound on the entropy rate h'(y). First, 
we make use of the fact that the entropy h(y) of a zero-mean complex random vector y of dimension N 
with nonsingular correlation matrix R y = E[yy H ] is upper-bounded by |fT8l 

h(y) <log [(vre) 7V det(R,)] 

^N\og[ne(a 2 x a 2 h + a 2 n )}. (34) 

Here (a) follows from the fact that H y is diagonal and given by 

= (4*1 + <%)In. (35) 

due to the assumption on i.i.d. input symbols. Nevertheless, the upper bound on h(y) in (l34l) also holds 
without the assumption on independent input symbols, which can be easily verified by using Hadamard's 
inequality instead of the equality in (a). 

Hence, with (|34|) the upper bound h'jj(y) on the entropy rate h'(y) is given by 

h'(y) < h'uiy) = log (ne [a 2 x a 2 h + a 2 n )) . (36) 

In Appendix HH we give another upper bound on h'(y) for the case of zero-mean proper Gaussian 
inputs based on numerical integration to calculate h(y k ), i.e., the output entropy at an individual time 
instant, see also ETI . As this bound can only be evaluated numerically using Hermite polynomials and 
Simpson's rule or by Monte Carlo integration, we do not further consider it here. 



C. The Entropy Rate /i'(y|x) 

In this section, we give an upper bound and a lower bound on the conditional channel output entropy 
rate /i'(y|x). The probability density of y conditioned on x is zero-mean proper Gaussian. Therefore, its 
entropy is 

/i(y|x) = E x [log ((vre^dettR^))] (37) 
where the covariance matrix H y \ x is given by 

Ry\x = E h , n [yy^M 

= E h [Xhh H X H |x] +a 2 n I N 

= XR h X H + o*I N . (38) 

As the channel correlation matrix H h is Hermitian and, thus, normal, the spectral decomposition theorem 
applies, i.e., 

R h = UA h U H (39) 

where the diagonal matrix A h = diag (Ai, . . . , Xn) contains the eigenvalues Aj of K h and the matrix U 
is unitary. 
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1) Upper Bound on ft,'(y|x); The following upper-bounding of h(y\x) is already known from [fT6l . 
Making use of (1391) , Jensen's inequality and the concavity of the log function, we can upper-bound h(y\x) 
in (1371 ) as follows: 



fc(y|x) = E 

(a) 



log (let ( ^XUA/jU ff X ff + Ijv 



logdet f -X^XUAftU^ + I 



N 



+ N\og(nea 2 n ) 
+ N\og(7iea 2 n ) 



(40) 
(41) 



(b) 

< logdet 



a: 



;XJA h U H + I 



(7 



N 



+ N log(7rec^ 



logdet ^A h + I N 
\< 

N f 2 \ 
^log(^§A 4 + l +iVlog(7re^ 

i=i \ an ' 



For (a) the following relation is used 



det(AB + I) = det(BA + I) 



(42) 



(43) 



which holds as AB has the same eigenvalues as BA for A and B being square matrices [22, Theorem 
1.3.20]. For (b) we have used the fact that logdet(-) is concave on the set of positive definite matrices^. 

To calculate the bound on the entropy rate h'(y\x), we consider the case iV — > oo, i.e., the dimension 
of the matrix A^ grows without bound. As H h is Hermitian Toeplitz, we can evaluate (1421) using Szego's 
theorem on the asymptotic eigenvalue distribution of Hermitian Toeplitz matrices [1231 . [|24l . Consequently, 



N 



J im ^rS log H Aj + 1 

N->oo JM *■ — ' V Ut 



8=1 



log [S h (f)^ + l)df. 



at 



(44) 



Notice that due to the assumption that the PSD exists, the autocorrelation function of the channel fading 
process is square summable, see ©, and, thus, Szego's theorem can be applied. 
Hence, we get the following upper bound: 



h\y\x) < hu(y\x) 



log S h (/)-f + 1 d/ + log(7re^). 



(45) 



At this point, it is interesting to note that for constant modulus (CM) input symbols the differential 
entropy rate h'(y\x) is equal to the upper bound h' u (y\x), i.e., 



fc'(y|x) 



CM 



h'u(y\x) 



as in this case (I4TI) simplifies as 



X^X 



and, thus, (b) succeeding (|4TI) holds with equality 



CM 



(46) 
(47) 



For the special case of independent transmit symbols, (b) can also be shown in two steps by using Jensen's inequality and in a second 
step expressing the determinant by a Laplacian expansion by minors to calculate the expectation, i.e., 



logdet ( 4x fl XUA h U fl + I N 



< logE x 



det ( \x H X\JA h V H + Ijv 



logdet ( ^f-UAhU" + Iiv 

.On 
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2) Lower Bound on h'(y\x) for a Rectangular PSD: In this section, we give a new lower bound on 
the entropy rate h'(y\x) for the special case of a rectangular PSD, which is a common approximation of 
the actual PSD in typical system design. 

For the derivation of this lower bound, we derive a circulant matrix which is asymptotically equivalent to 
the Toeplitz matrix R^. Hereby, we follow a specific approach as shown in ||23~1 . where the circulant matrix 



is constructed by sampling the PSD of the channel fading process. For the discussion of asymptotical equiv- 
alency, we write Rjj instead of R^, where the superscript (N) denotes the size of the square matrix Hh. 
Let the first column of the circulant matrix be given by 

J* ... cg?i) T (48) 

where again the superscript (N) denotes the size of the square matrix Cjj^. The elements are given 
by 



c * =ivZ.^hv )e (49) 

where S h (f) is the periodic continuation of S h (f) given in ©, i.e., 

oo 

Sh{f)= E 8(f-k)*S h {f) (50) 

k=— oo 

and ^(Z) being zero outside the interval |/| < 0.5 for which it is defined in d3). 

As we assume that the autocorrelation function of the channel fading process is absolutely summable, 
the PSD of the channel fading process S h (f) is Riemann integrable, and it holds that 

lirn c[ N) = lim - V S h ( -) 

1=0 v 7 



1 

J 2 i S h (f)e^ hf df = r h (k) (51) 



with r h (k) given by ©. 

As the eigenvectors of a circulant matrix are given by a discrete Fourier transform (DFT), the eigenvalues 
with k = 1, . . . , N of the circulant matrix are given by 



N-l 



AT-l / iV-1 



, pJ"" jv p AT 



i(m-(fc-l)) 
AT 



2=0 \ m=0 

AT-l f , 7V-1 

m=0 y 1=0 

Sh ( ■ (52) 



N 



Consequently, the spectral decomposition of the circulant matrix is given by 

C h N) = F w Af' (F (JV) )^ (53) 
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where the matrix is a unitary DFT matrix, i.e., its elements are given by 

1 



[ F (A0] 



k.l 



(fc-i)(i-i) 

oJ £7T N 



N 



(54) 



Furthermore, the matrix is diagonal with the elements given in (l52l) . 

In G31 Lemma 4.6] it is shown that the circulant matrix Cj^ and the Toeplitz matrix R^ are 
asymptotically equivalent if the autocorrelation function r h {l) is absolutely summable. In the context of 
proving this lemma it is shown that the weak norm of the difference of R^ and converges to zero 
as N — > oo, i.e., 



lim 

TV->-oo 



K[ N) - C 



(AO 



where the weak norm of a matrix B is defined as 

1 



IBI 



N 



Tr [B H B] 







(55) 



(56) 



The convergence of the weak norm of the difference R^ — C^"' towards zero is required later on. 



By the construction of the circulant matrix C\^\ the eigenvalues A^ of C^' are given by (|521 . i.e. 



(TV) 



S h (f 
S h (f 



^i) for 1 < k < [f 1 

fc-i 

TV 



— 1) for [f ] < k < N 



(57) 



Thus, if the PSD of the channel fading process Sh(f) is rectangular, the eigenvalues of the circulant 
matrix are given by 



2-/d 



.(AO 



for 1 < k < f d N + 1 

V (l-f d )N + l<k<N 
otherwise 



(58) 



This means that the eigenvalues of R^ v; corresponding to frequencies |/| > f d become zero for N — > oo. 

Now, we apply the asymptotic equivalence of R^ and to lower-bound the entropy rate h'(y\x) 
given by 



fr'(y|*; 

with h(y\x) given in (I4TT) . Thus, we have to show that 



TV-»oo iv 



lim — 

TV^oo N 



log det 



erf 



:X"XR^ + 1 N 



lim — E x 



log det 



To prove (160b . we have to show that the matrices 



K 



(TV) 



-x^xcf^ I 



N 



a, 



2 x H xc, 



(TV) 



+ 1 



N 



(59) 

(60) 

(61) 
(62) 



are asymptotically equivalent [23 , Theorem 2.4]. This means that we have to show that both matrices are 
bounded in the strong norm, and that the weak norm of their difference converges to zero for N — > oo 
11251 Section 2.3]. 
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Concerning the condition with respect to the strong norm we have to show that 



(TV) 



K 



(TV) 



with the strong norm of the matrix B defined by 

IIBII 2 



< oo 

< oo 



max7 fc 

k 



(63) 
(64) 

(65) 



where 7^ are the eigenvalues of the Hermitian nonnegative definite matrix BB H . The diagonal matrix 
X H X contains the transmit powers of the individual transmit symbols. In the case of Gaussian input 
distributions, for a given e > 0, there exists a finite value M(e) such that the transmit power is smaller 



than M(e) with probability 1 — e. In addition, the strong norms of and C^"' are bounded, too. 

Concerning the boundedness of the eigenvalues of the Hermitian Toeplitz matrix R^ see ||23l Lemma 
4.1]. Thus, the strong norms of and are asymptotically almost surely bounded. 



Furthermore, for the weak norm of the difference — we get for iV — > 00 



.(TV) 



K[ N) - K 



(TV) 



2 x-x ]f 



a 



< — X^X 

oi 11 1 



R 



(TV) 



C 



(TV) 



(66) 



where for (a) we have used 11231 Lemma 2.3]. 

Based on the above argumentation that ||X^X|| is asymptotically almost surely bounded, we get for 

iV — > 00 



lim 

TV-s-oo 



(TV) 



K 



(TV) 



< lim — 

TV->oo at 



X H X\ 



R 



(TV) 



C 



(TV) 



= (67) 
due to (l55l) . Thus, we have proved that (|60l) holds and we can express the entropy rate /i'(y|x) by 



fc'(y|x) 



lim — E, 

TV->oo N 

lim — E, 

TV^oo N 



1 



logdet I —X H XC 



1W 



I 



1 



TV 



logdet I ^-X H XFA h F H + I 



11 



N 



Here, FA h F H is the spectral decomposition of the circulant matrix , see (1531) (from here on we again 
omit the superscript (N) for ease of notation). Thus, is a diagonal matrix containing the eigenvalues 
Xk as given in (|58l) and the matrix F is a unitary matrix with the eigenvectors of on its columns. 

To calculate a lower bound on ft/(y|x), we transform the term with the expectation operation at the 
RHS of dgSb as follows: 



log(vrea^) 
+ hg(nea 2 n ] 



(68) 



log det 



X^XFA^F" + 1 



H 



TV 



(a) 



(6) 



E, 



log det 
log det 



-A h F H X H XF + I 



(Jr. 



TV 



( oj 



F H X H XF + 1 



(69) 



where for (a) we have used (|43l . For (b) the eigenvalue distribution in (1581) is used, and the matrix F is 
given by 



[fl, • • • , f [f d N+lj , f [(1- 



fd)N+i] > 



f N ] e c JVx ( 2 ^ JV J+ 1 ) 



(70) 
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where the fj are the orthonormal columns of the unitary matrix F. I.e., F contains the eigenvectors 
corresponding to the non-zero eigenvalues of . Now, we apply the following inequality given in E51 
Lemma 1]. 

Lemma 1: Let A G C mxn with orthonormal rows and m < n. Then 

logdet (A diag (p 1? ...,p n ) A H ) > trace [A diag(logpi, . . . , \ogp n )A H ] 
if p h . . . ,p n > 0. 

With Lemma 1, we can lower-bound (1691 ) such that 



(71) 



log det 
> E 



' "dO 

F^diagl log 



trace 



trace 



F^diagl E x log 



2 



1 



+ 1 



•,log 



E^lOg 



VdOl 



+ 1 



+ 1 



2L/ d 7VJ+l 



d J_r / rr 2 \ 



(72) 



where (a) results, because all x k are identically distributed and because the columns of F are orthonormal. 
Hence, with (1681) a lower bound on the entropy rate is given by 

2L/ d ATJ+l . 2 v 

"'^M^iV £ E.log^laf + lj+log^) 

= log (2^2 kl 2 + l) + log(ire^) = h' L (y\x). (73) 

Thus, we have found a lower bound on the entropy rate h'(y\x) for identically distributed (i.d.) input 
distributions. To the best of our knowledge, this is the only known lower bound on the entropy rate ft/(y|x) 
which is not based on a peak power constraint. 

For independently identically distributed (i.i.d.) zero-mean proper Gaussian input symbols the lower 
bound h' L (y\x) becomes 



poo 

h' L (y\x)=2f d log 

Jz=0 



2^-2 



_Jl^. z + i ) e -* dz + log(Tre^) 



(74) 



Discussion on the Assumption of a Rectangular PSD: For the case of constant modulus (CM) input 
distributions, it can be shown that the rectangular PSD maximizes ft/(y|x) among all PSDs with a compact 
support interval [—fd, fd] and a channel power o\. For the proof of this statement, we have to calculate 
sup s (t\ eS h'(y\x.)\ where the set S of PSDs is given by 



S ={.%{/)■■ 0for/,,< |/| < OA / .%{/),!/■■ nf,}. 
With d45j and (|46l) . we get 



(75) 



sup /i'(y|x; 

Sfc(/)e5 



lcM = SU P / 1 °g( 7re (^(/) (T x + (T n)) ( #' 
"fd 



/Id 
log (vre (S h (f)al + o*)) df 
-fd 



(a) 



S h (f)t 

fd 



log ire 



fd 



a 



h Jl 



2fa 



(76) 
(77) 
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i.e., the PSD S h (f) which maximizes h'(y\x) is rectangular 



S h (f) = { * **\f\?f* . (78) 
otherwise 



The last step in (|77l) can be proven as follows. The argument of the supremum in (|76l ) is concave on the 
convex set S. To find the Sh(f) that maximizes the supremum in (|76T ). we define the functional 

J(S h ) = J' log (vre (S h (f)a 2 x + a 2 n )) df + c ^ " - ^ (79) 

where c is a constant and the last term accounts for the constraint 



S h (f)df = a 2 h . (80) 



For the Sh(f) that maximizes, the following equation must be fulfilled for each / within the interval 

[—fd, fd] 

dJ a 2 x 
dKU)~ S h {f)al + al +C ~ ■ 

As this equation has to be fulfilled for each / and constant c, Sh(f) must be constant within the interval 
[—fd, fd]- As the second derivative of J with respect to Sh(f) is negative for all Sh(f) included in S, the 
given extremum is a maximum. Thus, with (|80l ), (1771 ) follows. 

We conjecture that a rectangular PSD of the channel fading process maximizes h'(y\x) for any i.i.d. 
input distribution with an average power a 2 , including the case of i.i.d. zero-mean proper Gaussian input 
symbols. Concerning this discussion see also [|26l Section IV- A]. Consequently, the lower bound in (1731 
then holds only for a rectangular PSD. As this lower bound on h'(y\x) is finally used for the upper bound 
on X'(y; x), following the preceding conjecture, we get an upper bound on the achievable rate for a given 
maximum Doppler spread f d for the worst case PSD. 

D. The Achievable Rate 

Based on the upper and lower bounds on h'(y) and ft/(y|x), we are now able to give upper and lower 
bounds on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols. 

1) Lower Bound: With (|27l) . (l33l) . and (l45l) . we get the following lower bound on the capacity 

X'(y;x)>^(y)-^(y|x) 



oo - 1 



log (1 + pz) e~ z dz - / log ( 1 + p^fi- ) df 



=0 j-± \ cr h 

= X^(y;x) ~ (82) 

where p is the average SNR as defined in (fTT|) . Notice that lower bounds on the achievable rate are also 
lower bounds on the capacity. Therefore, in the context of this lower bound we use the term capacity in 
the following. The lower bound in (l8~2l) is achievable with i.i.d. zero-mean proper Gaussian input symbols. 
As already stated, the lower bound on the capacity given in (1821) is already known from [fT6l . The bound 
in (f82l) holds for an arbitrary PSD of the channel fading process with compact support. For the special 
case of a rectangular PSD as given in © the lower bound in (l8~2l) becomes 



Xi(y;x)L = / log(pz + l)e- 2 dz-2/ d log(-^ + l). (83) 



2/ d 



As the mutual information rate is nonnegative, we can further modify the lower bound in (1821 as 
follows: 

Xi mod (y;x) = ma x{Jl(y;x),0}. (84) 
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2) Upper Bound: Using (|27b , (136b , and (1741) we can upper-bound the achievable rate with i.i.d. zero- 
mean proper Gaussian input symbols and a rectangular PSD of the channel fading process by 

Z'(y;x)>/^(y)-/4(y|x) 

= log(p+l)-2/ d jr ]og(^-z + l)e-*dz 

= x[/(y;x). (85) 

Notice that for the derivation of this upper bound the assumption on independent input symbols has not 
been used. On the other hand, the restriction to identically distributed input symbols is required for the 
calculation of h' L (y\x) and, thus, for X^(y;x). 

To the best of our knowledge, the upper bound in (|8~5l is new. Most other available upper bounds on 
the capacity hold only for input distributions with a peak power constraint and become loose for high 
peak-to-average power ratios, see , e.g., [|2) and |UJ- However, it has to be stated that the peak power 
constrained upper bounds in [2] and [Q] are upper bounds on capacity and hold for an arbitrary PSD of 
the channel fading process. 

As the mutual information rate in case of perfect channel state information at the receiver X'(x;y|h) 
always upper-bounds the mutual information rate in the absence of channel state information, i.e., 

?{y]x)<Z{y,x\h) (86) 
we can modify the upper bound in (185b as follows: 

X 'u mod {T^) = min{X[ / (y;x),X / (x;y|h)} (87) 

with X'(x; y|h) given in ((3Tb . 

3) Tightness of Bounds: In the following we study the tightness of the given bounds on the achievable 
rate for i.i.d. zero-mean proper Gaussian input symbols. 

Fig. [TJ shows the upper bound (|85b/(l87b and the lower bound (l8"3"b/<f8"4b on the achievable rate with i.i.d. 
zero-mean proper Gaussian input symbols as a function of the channel dynamics, which is characterized 
by fd, in case the PSD of the channel fading process is rectangular for different SNRs. Obviously, 
the achievable rate strongly decreases with increasing channel dynamics, i.e., /<$. Furthermore, the gap 
between the upper and the lower bound depends on the SNR and gets larger with an increasing SNR. In 
the following, we study the tightness of the given bounds analytically. This examination will show that 
the gap between the upper and the lower bound is bounded. 

To evaluate the tightness of the upper and the lower bound on the achievable rate with i.i.d. zero-mean 
proper Gaussian input symbols, we first evaluate the tightness of the upper and the lower bound on the 
channel output entropy rate h'(y). Afterwards, we evaluate the tightness of the upper and lower bound 
on ft/(y|x). 

The difference between the upper bound h' u (y) and the lower bound h' L (y) in (|36b and (|33b is given 

by 

A h'(y) = h'u(y) - h' L (y) 

/■oo 

= log (1 + p)- log{l + pz)e~ z dz. (88) 
Jo 

Fig. [TJin the Appendix |B] shows this difference. For p — > the difference A^( y ) converges to zero. And 
for p — > oo the difference is given by 

lim A A /(y) = 7 ~ 0.57721 [nat/cu] (89) 
where 7 is the Euler constant. The limit in (l89l) can be found in ffTOl . 
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Fig. 1. Upper and lower bound on the achievable rate with i.i.d. zero-mean proper Gaussian input distribution on a Rayleigh flat-fading 
channel with a rectangular PSD in bits per channel use (cu) over fd 



The difference Ah'(y) monotonically increases with the SNR, as 

dA h , [y) 1 



dp 1 + p J 1 + pz 



00 z 



e z dz 



O) 1 1 

> t— TjT~ = (90) 

1 + p 1 + p 

where for (a) we have used that is concave in z and, thus, we can apply Jensen's inequality. Thus, 
A /l /( y ) is bounded by 

< A h , (y) < 7 . (91) 

In addition, the difference between the upper bound and the lower bound on /i'(y|x) in case of a 
rectangular PSD is given by, cf. (145b and (T74T ) 

A fc /(y W = fi'Mx) - h' L (y\x) 

= 2 4 ,og { i+ w)-r ,og { i+ w/)^ iz }- <92) 

For asymptotically small Doppler frequencies A^( y i x ) approaches zero independently of the SNR. Fur- 
thermore, observing the structural similarity between (|92| ) and (f88l ), it can be shown that 

HmA % | x) = (93) 

independently of fd- For asymptotically high SNR and a fixed f d the difference is bounded by 

lim A h , {y[x) = 2f dl w 2f d ■ 0.57721 [nat/cu] (94) 

p— >-0O 
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where the same limit as in (|89l is used. Corresponding to A ft /( y ), A^/( y | x ) is monotonically increasing 
with the SNR and thus, it can be bounded by 

< A h , {y]x) < 7 2f d [nat/cu]. (95) 

With A /l /( y ) and Av( y |x> the difference between the upper bound X^(y; x) and the lower bound l' L (y; x) 
for i.i.d. zero-mean proper Gaussian input symbols and a rectangular PSD is given by 



I Rect 



Ai' (y ;x) = i'u(y; x ) - x l(y; x ) 

= A/ l /(y) + A/ l /(y| X ). (96) 

Hence, we get the following limits 

limA r(y;x) = 

lim A r(y;x) = 7 (1 + 2f d ) (97) 

p— >oo 

and as A^( y ), Ah*( y ) 9 and, thus, Aj/( y x ) monotonically increase with the SNR, we can bound the difference 
by 

< A x , (y;x) < 7 (1 + 2f d ) [nat/cu]. (98) 

a) Asymptotically Small Channel Dynamics: For asymptotically small channel dynamics, i.e., f d — > 

0, the lower bound X^(y; x) in (|8~2l) converges to the mutual information rate in case of perfect channel 
knowledge in (|3T| ) 

limX^(y;x)=X'(y;x|h) (99) 

1. e., to the coherent capacity. This corresponds to the physical interpretation that a channel that changes 
arbitrarily slowly can be estimated arbitrarily well, and, therefore, the penalty term X'(x; h|y) in (|24l) 
approaches zero. Thus, for f d — > 0, the lower bound X^(y; x) is tight. 



E. The Asymptotic High SNR Behavior 

In this section, we examine the slope of the achievable rate over the SNR for asymptotically large SNRs 
depending on the channel dynamics. For a compactly supported PSD the lower bound on the achievable 
rate with i.i.d. zero-mean proper Gaussian input symbols given in (|8~2l) is characterized by the following 
high SNR slope@, which is often named pre-log 

jT log(pz + l)e~ z dz - y ' log (^P-P + l) df 

r f l ^f p J 

/ — e~ dz- \ ^-t^ df 

= 1-2/, (100) 

as S h (f) + for |/| < f d . 

The upper bound Z[j{y\ x) holds only for the special case of a rectangular PSD of the channel fading 
process. For this case the difference between the upper bound X^(y;x) and the lower bound X^(y;x) 
converges to a constant for high SNR, cf. (l97l . Thus, both bounds must have the same asymptotic high 
SNR slope and we conjecture that the achievable rate X'(y; x) is also characterized by the same asymptotic 

4 When using the term high SNR slope we refer to the high SNR limit of the derivative of the achievable rate (bound) with respect to the 
logarithm of the SNR. 



p^oo d\og(p) 



d 



p->oo <91og(p) 



= lim 
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SNR slope. In it has been shown that the high SNR slope (pre-log) of the peak power constrained 
capacity also corresponds to 1 — 2f d . For a more detailed discussion on this, we refer to Section IIII-Fl 

It is interesting to note that the high SNR slope of the achievable rate is degraded by the term 2f d . 
Now, recall the discussion on the limitations of the discrete-time input-output relation in Section III-BI 
There it has been shown that symbol rate sampling does not yield a signal representation with a sufficient 
statistic, as the normalized received signal bandwidth is given by 1 + 2f d (for a roll-off factor (3 ro = 0). 
The excess bandwidth leading to aliasing is given by 2f d , which exactly corresponds to the degradation 
of the high SNR slope of the achievable rate. Up to now, we do not know, if there is an implicit relation 
between these observations. 



F. Comparison to Asymptotes in /TJ/ 

In (3]|, Lapidoth gives bounds for the capacity of noncoherent Rayleigh fading channels. These bounds 
are mainly derived to evaluate the asymptotic high SNR behavior. He distinguishes between two cases, 
nonregular and regular fading introduced by Doob [14). The case of nonregular fading is characterized 
by the property that the prediction error variance of a one- step channel predictor — having infinitely many 
observations in the past — asymptotically approaches zero, when the SNR approaches infinity. As we 
consider the case that the PSD of the channel fading process is bandlimited with f d < 0.5, our scenario 
corresponds to the nonregular case in J3), which is also named pre-log case. In contrast to our bounds 
on the achievable rate, where we assume i.i.d. zero-mean proper Gaussian input symbols with an average 
power a 2 , [[31 does not constrain the input distribution except of a peak power constraint. 

The capacity bounds in are given by J3] eq. (33) and (47)] 

C < loglogp - 7 - 1 + log ( 2 ) + o(l) (101) 

\ e pred\ l l P) J 

C > log | 2 1 8 ) -7-logh 2 l ,.,-J -logfy), (102) 

where 7 « 0.577 is the Euler constant and p is defined as 

= Pp^l (1Q3) 

i.e., it is an alternative definition of an SNR based on the peak power P pea k instead of the average power 
a 2 used for the definition of the average SNR p. Furthermore, o(l) depends on the SNR and converges 
to zero for p — > 00, i.e., f(n) E o(g(n)) if 

f(n) 

lim ^-f = 0. (104) 

n->oo g[n) 



In addition, the prediction error variance e pred (8 ) is given by 

4red(5 2 ) = exp log + 5^ - 5\ (105) 

Although for the bounds on the peak power constrained capacity in not an explicit average power 
constraint has been used, but only a peak power constraint, by this peak power constraint implicitly also 
a constraint on the average power is given. This should be obvious, as for the average power a 2 x the 
inequality a 2 x < P pea k must hold. Furthermore, it has to be considered that in case of using a peak power 
constraint, it is in general not optimal to use the maximum average power a 2 . For a discussion on this 
see below in Section UlI-GU In case the maximum average power a 2 is not used, i.e., E [|xfc| 2 ] < a 2 , the 
SNR p as defined in CCD) is not the actual average SNR. Therefore, in the case of using a peak power 
constraint, p is named nominal average SNR. However, in the case of i.i.d. zero-mean proper Gaussian 
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Fig. 2. Comparison of the bounds on the achievable rate with i.i.d. zero-mean proper Gaussian inputs in Il85)/ll87t and d83 !>/ J841> (SNR p) 
with asymptotic bounds on the peak power constrained capacity in dlOU and dl02t (SNR p), (3) eq. (33) and (47)] (The asymptotic upper 
bound d 1 1 1 > only holds for p — > oo as we neglect the term o(l) in dlOll l, which approaches zero for p — > oo.); rectangular PSD of the 
channel fading process 



input symbols, the achievable rate is maximized when using the maximum average power a^, i.e., in this 
case the nominal average SNR p is also the actual average SNR. 

As the peak power constraint that has been used for the bounds on the peak power constrained capacity 
in [[31, i.e., for (11011 ) and (11021 ), implicitly constrains the average power to a^, for the comparison of the 
bounds on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols and the bounds on 
the peak power constrained capacity in [3], we choose p in (| 1 11) and (11021) to be equal to the nominal 
average SNR p used for the bounds on the achievable rate with i.i.d. Gaussian input symbols, i.e., set 

rr 2 — P 
u x 1 peak- 
Fig. |2 shows a comparison of the lower bound on the capacity in (l83l/(l84l) and the upper bound on the 
achievable rate with i.i.d. zero-mean proper Gaussian inputs in (l85l)/ (|87T) with the high SNR asymptotes 
for the capacity in the corresponding pre-log case given in [0, i.e., (11011) and (|102l) . The bounds on the 
achievable rate with i.i.d. zero-mean proper Gaussian input symbols, i.e., the lower bound in (l83l/ ((84l) and 
our upper bound in (l85l)/(l87T), are in between the asymptotes for the upper bound and the lower bound on 
capacity given in [3|. However, the bounds in [|3l consider a peak power constrained input distribution. 
Therefore, this comparison is not absolutely fair. In addition, and this is the main observation from this 
comparison, our bounds have the same slope in the high SNR regime as the high SNR asymptotes for 
the peak power constrained capacity in [|3). 

G. Comparison to Capacity Bounds for Peak Power Constrained Inputs in /|2]/ and /[/]/ 

In the following, we will draw the connection of the bounds on the achievable rate with i.i.d. zero- 
mean proper Gaussian input symbols given in (l82l/([84T) and (l85l)/ (|87l) with the bounds on the peak power 
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constrained capacity given in [2] and [DQ. The peak power constrained capacity is defined by 

C peak = lim sup ^=X(y; x) (106) 

with "pp eak being the set of peak power constrained probability density functions of the input distribution 
given by 

ppeak = | p(x) x e C N^ j_ E ^ < ^ ^ |2 < yfc J (1QJ) 

Following along the lines of the derivation of the bounds on the achievable rate with i.i.d. zero-mean 
proper Gaussian input symbols, we discuss the differences when deriving bounds on the capacity with 
peak power constrained input symbols. 

1) Upper Bound: We start with the derivation of the upper bound on the peak power constrained 
capacity. With (1271) . an upper bound on the peak power constrained capacity can be given based on an 
upper bound on the output entropy h'(y) and a lower bound on the conditional output entropy h'(y\x) 
resulting irjf] 

su P Z'(y;x) < supK(y) -/i' L (y|x)}. (108) 

■ppeak ppeak 

Note that at the moment ^(y) and h' L (y\x) are only place holders for upper and lower bounds, which 
are not yet further specified. In the following, we will relate the derivation of the corresponding bounds 
given above for i.i.d. zero-mean proper Gaussian input symbols to the case of peak power constrained 
input symbols considered here. 

As the derivation of the lower bound on h'(y\x) in Section ITII-C2I relies on the assumption on identically 
distributed (i.d.) input symbols, in a first step we restrict to this kind of input distributions and, therefore, 
define the following set of probability density functions: 



Vlf = < Pi*) 



X G C", p( Xi ) = P (Xj) {E[\ Xk \ 2 } < a 2 x , \x k \ 2 < Ppeak} Vfc \ (109) 



which corresponds to the set pp eak in (11071) with the further restriction that the input symbols are identically 
distributed. I.e., we derive an upper bound on 

supX'(y;x) < sup {/i' a (y) - h' L [y\x)} 



peak ^r,peak 



"PL 

= sup sup {ti v (y) -h' L (y\x)}. (110) 

a6[0.l] „peak I 

The calculation of the supremum in (|1 101 ) is done in two steps. The inner supremum is taken over the 
constrained set Vf^ja being characterized by an average power oto 2 x which holds with equality. Because 
of the fact that in (11091) we only use a constraint on the maximum average input power given by a 2 , the 
outer supremum is taken over a G [0, 1]. The set Vfl \a is given by 



x G C^, p(xi) = p(xj) Vz,j, {E[\x k \ 2 } = aa 2 x , \x k \ 2 < P peak } VA; \ (111) 



which corresponds to the set Vf% except that the average power is now fixed to aa 2 with equality. Such 
a separation has also been used in [1J and in flSJ. 

5 Note that in HOSl we make a slight misuse of notation. The set ■p pedk is defined for input vectors x of length N. Therefore, the exchange 
of the limit and the supremum as it is used in d!08t while using the mutual information rate is formally not correct. However, to avoid a 
further complication of notation, we use the set ■p pedk also in the context of information rates. The same holds also in the following for other 
sets of input distributions. 
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For the evaluation of (11 101 ) we require an upper bound on h'(y) and a lower bound on ft/(y|x) which 
hold for all input distributions contained in the set P^ ak |a;. 

All steps of the derivation of the upper bound on h'(y) in Section ITII-B2I for the case i.i.d. zero-mean 
proper Gaussian input symbols hold also for all input distributions contained in the set P^j^a, except 
that (a) in (|34|) is now an inequality (Hadamard's inequality), as the matrix H y is not necessarily diagonal 
as we have dropped the restriction to i.i.d. input symbols. Furthermore, the average transmit power of the 
input symbols is now fixed to aa 2 and, thus, we get the following upper bound: 

h'uij) lp( x )ePff \a = lo S + °D) • (H2) 

Concerning the lower bound on /i'(y|x) up to (1731 only the assumption on i.d. input symbols has been 
used, which also holds for all input probability density functions contained in P?^ ak |a:. Thus, substituting 
(FTTH) and (J73]) into (fTTOl) we get 



supX'(y;x)< sup sup {/^(y) - h' L (y\x)} 



(■ 



sup sup <! log (ap + 1) - 2f d E x log ( ° h 2 \x\ 2 + 1 

a6[0,l]„peakl I \ Z Jd a n , 

1 \ A \ a 



(- 



= sup I log {ap + 1) - 2f d inf E x log I ^y^\x\ 2 + 1 ) } (113) 

a6[0,l] I Vf^la \ Z Jd (J n 

with the nominal average SNR p given in (fTTj) . 

The term containing the infimum on the RHS of (|1 131) can be lower-bounded in the following way: 

/ rr 2 \ ry/P^ log ( of^rM 2 + 1 ) 
inf E x log — ^|x| 2 + l = inf / ^ria J -\x\ 2 p{\x\)d\x\ 



' i.d. i.d. 1 1 



> — inf / |x| p(|x|)<i|x| 



-Ppeak P p f k L </|x|=0 



l0 - ( 2j^l P P^ 
p 

1 peak 



aal (114) 



where for (a) we have used that all factors of the integrand are positive and that the term 

^fey^ 1 ) i 



\xr z 

12 ; c 



log(cz + l) (115) 



with c = 2fop. and 2; = \x\ is monotonically decreasing in z as 

9/1 \ c log(c2; + l 
-|-log(c, + l)j = ^ TT ^ — 



cz 

& -<log(cz + l) (116) 

cz + 1 



which holds for cz > —1. Thus, the term in (11151) is minimized for z = \x\ 2 = P pea k- A similar approach 
to calculate the infimum in (II 141) has been used in ETl for an analogous problem. Notice that the result 
given in (II 141) means that the infimum on h' L (y\x) for a fixed average transmit power is achieved with 
on-off keying. 
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With (II 141) . we get the following upper bound on the RHS of (11131 ): 

supX(y;x)< sup (log (ap + 1) - 2f d ^- log (-^P peak + l) ) 

■pP^ a6[0,l] I -fpeak V^Jrf "™ / J 



sup (log (ap +l)-2/ d ^ log f If + 1) }> (117) 

ae[0,l] I P \ Z Jd 



with the nominal peak-to-average power ratio^ 



/3 = 5^. (118) 



As the argument of the supremum on the RHS of (|1 171) is concave in a and, thus, there exists a unique 
maximum, it can easily be shown that the supremum of (|1 171 ) with respect to a £ [0, 1] is given by 

"op. :mii^l.(^logf^ + l)l (119) 



/3 ° V2/ ( 



and, thus, 



sup J'(y; x) < log (a opt p + 1) - 2/,^ log (£. + 1 



■pi 

' i.d 



\2/,/ 

xUy;x)| (120) 

1 -^peak 



With (11201) . we have found an upper bound on the achievable rate with i.d. input symbols and a peak 
power constraint for the special case of a rectangular PSD of the channel fading process. Note that the 
writing X^(y; x)| denotes an upper bound on the peak power constrained achievable rate. 

I -fpeak 

Note that a opt < 1 corresponds to the case that it is not optimal to use the maximum average transmit 
power allowed by the set Vf^ . This behavior is a result of the peak power constraint. Therefore, consider 
the extreme case (3 = 1 and f d = 0.5, i.e., an uncorrected channel, a = 1 then would correspond to 
constant modulus signaling, i.e., the transmitter puts all information into the phase of the transmitted 
signal. As the channel is uncorrected from symbol to symbol and unknown to the receiver, the mutual 
information rate X'(y; x) is zero. Therefore, it is better, if the receiver does not use all its transmit power, 
i.e., uses an a < 1, enabling modulation of the magnitude, which leads to a positive X'(y; x). 

The choice a opt = 1, corresponding to the case that it is optimal to use the maximum possible average 
transmit power, can be shown to be optimal, on the one hand, if 

'1 



1<P<- 



cxp 



2 2/, 



(121) 



or, on the other hand, if 



2/ d < for p < 1. (122) 

p + 2 

6 Instead of the common term peak-to-average power ratio we choose the term nominal peak-to-average power ratio, as in case of a 
peak power constraint it is not necessarily optimal to use the maximum average power <j%. In case the actual average power is equal to the 
maximum average power a^., /3 corresponds to the actual peak-to-average power ratio. 
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Fig. 3. Comparison of the upper bounds on the achievable rate for i.d. input symbols with a peak power constraint in (1120t /ll87t and 
with i.i.d. zero-mean proper Gaussian inputs (PG) in (I85t / d87t ; (Note, d85 b also holds for i.d. zero-mean proper Gaussian input symbols); in 
addition, the capacity lower bound ( I127t /ll84t is shown, which is achievable with i.i.d. input symbols and a peak power constraint 



For a proof of these conditions see Appendix As in realistic scenarios f d is close to zero, the conditions 
(| 1 2 1 1) and (|122|) are typically fulfilled. However, for the parameter range displayed in Fig. |3]the conditions 
in (11211) and (|122|) are not always fulfilledJZl 

In terms of the analytical expression, the upper bound on the achievable rate with i.d. input symbols 
in (|120l) is equal to the upper bound on the peak power constrained capacity given in 01 Prop. 2.2]. 
However, the upper bound in [1, Prop. 2.2] is, on the one hand, an upper bound on capacity as, except 
of the peak and average power constraints, no further assumptions on the input distributions have been 
made. On the other hand, the upper bound in [1 1 Prop. 2.2] holds for arbitrary PSDs of the channel fading 
process, while the derivation of the upper bound in (|1201 i is based on the assumption of a rectangular 
PSD of the channel fading process. However, the approach of the derivation of the upper bound on the 
capacity given in [1, Prop. 2.2] is completely different to our approach and is inherently based on the 
peak power constraint, while we use this peak power constraint only in the last step of the derivation. 

7 Note that in case of i.i.d. zero-mean proper Gaussian input symbols as discussed before, it is not necessary to use the factor a, i.e., 
consider cases where the actual average transmit power is smaller than the maximum average transmit power, as it can be shown that the 
upper bound in d85t is always maximized while using the maximum available average transmit power. The proof is based on the fact that 
J85 b monotonically increases with p as 

— {log (P+ 1) - 2/ d J™ log (£-z + l) e~*dz} = -i^ - 2/ d r ^ ; e ~* dz 



dp I ° vr ' J Jo *\2f d ) J p+1 J "J &z + l 



m i 



2/d' 

r 



>^T-2/ d ^^->0 (123) 
p+1 g£ + l 

where for (b) we use that p d is concave in z and, thus, we can apply Jensen's inequality. This indicates that in case of the lack of a 

^ 2 

peak power constraint it is optimal to use the maximum average transmit power a%. 
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Therefore, our lower bound on /i'(y|x) in (|73l ) also enables to give an upper bound on the achievable rate 
for non-peak power constrained input symbols like proper Gaussian input symbols. 

As stated, we made the assumption on identically distributed (i.d.) input symbols in the derivation of 
our upper bound. We do not know if this assumption poses a real restriction in the sense of excluding the 
capacity-achieving input distribution. Therefore, it would be necessary to know if the capacity-achieving 
input distribution is characterized by identically distributed input symbols. We have no answer to this 
question. However, as in case of a peak power constraint our upper bound on the achievable rate given in 
(11201) corresponds to the upper bound on the peak power constrained capacity given in 01 Prop. 2.2], the 
restriction to identically distributed inputs seems not to be a severe restriction in the sense that it leads 
to an upper bound being lower than the capacity. 

However, in 0]| it is shown that i.i.d. inputs, i.e., with an additional constraint on independent input 
symbols, are not capacity achieving in general. Based on the parameter 



it has been shown in [1] that under the assumption of an absolutely summable autocorrelation function 
Th{l), see ©, in the asymptotic low SNR limit i.i.d. inputs are only capacity-achieving in the following 
two cases 

• if A = a\, corresponding to a memoryless channel, 

• or with a nominal peak-to-average power ratio of /3 — 1 and A > 2o^, i.e., when the fading process 
is nonephemeral. 

Notice that the proof in 0] is explicitly based on the asymptotic low SNR limit. On the other hand, for the 
high SNR case i.i.d. zero-mean proper Gaussian inputs achieve the same asymptotic high SNR behavior, 
in terms of the slope (pre-log), as the peak power constrained channel capacity, as it has been discussed 
in Section Him 

In Fig. [31 the upper bound on the achievable rate with a peak power constraint in (11201 ) is shown for 
different nominal peak-to-average power ratios j3 in comparison to the upper bound on the achievable rate 
for zero-mean proper Gaussian input symbols in (|85t (both combined with (1871)). This comparison shows 
that except for j3 close to 1 and a small to average SNR or sufficiently small channel dynamics the upper 
bound on the achievable rate for proper Gaussian inputs is lower than the upper bound for peak power 
constrained input symbols in (11201) . 

a) High Nominal Peak-to-Average Power Ratios: Considering higher order modulation, the nominal 
peak-to-average power ratio f3 may become relatively large. For proper Gaussian inputs it is in fact 
infinite. Obviously, for large peak powers P pea k> the second term in the upper bound on the RHS of (11201 ) 
approaches zero and, thus 



which obviously is loose as this is the capacity of an AWGN channel being already larger than the 
coherent capacity of the fading channel. This underlines the value of the upper bound on the achievable 
rate with i.i.d. zero-mean proper Gaussian input symbols, which are not peak power limited and serve 
well to upper-bound the achievable rate with practical modulation and coding schemes. 

It can be shown that the upper bound in (11201) becomes loose for f3 > 1 and high SNR. Therefore, 
we calculate the asymptotic high SNR slope of the peak power constrained upper bound given in (|120l ). 
where for the moment we restrict to the case of using the maximum average power, i.e., a — 1, although 
this is in general not an upper bound on the achievable rate. The motivation for this will become obvious 
afterwards. For the peak power constrained upper bound given in (II 171) and for the special case a = 1 




(124) 



Jim 1'uir, x 



peak 



log (p + 1) 



(125) 
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the derivative with respect to log(p) in the high SNR limit is given by 



dl' T 



lim 



peaki' 







d\og(p) 



p->oo d\og(p) 



log(p + 1) - 2 ~f 



lim 

p— >oo 



p 



J_ 

2/ d 



1 - 



(126) 



P+l 

p ' 

Obviously, if the nominal peak-to-average power ratio f3 is not equal to one, the slope of the peak power 
constrained upper bound with the constraint a = 1 is higher than the slope of the high SNR asymptote 
on the peak power constraint capacity given in [|3), see (11011 ), which is given by 1 — 2f d . The asymptotic 
bound in (1 1 1 1> holds for an arbitrary nominal peak- to-average power ratio (3. As an optimized a will lead 
to a larger upper bound, this unveils that the peak power constrained upper bound on the achievable rate 
in (fT20l) is loose for R > 1 and high SNR. 

2) Lower Bound: As done before for the case of the upper bound, now we discuss the relation between 
the lower bound on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols in (l82l and 
the lower bound on the capacity for peak power constrained input symbols given in [2, (34)]. The lower 
bound on the achievable rate given in (|82l obviously does not hold in case of a peak power constrained 
input, as in this case the coherent mutual information rate X'(y;x|h), being used in (1301 ) to calculate a 
lower bound on h'(y), is smaller than (13T1 ), which holds for the case of i.i.d. zero-mean proper Gaussian 
inputs, the capacity-achieving input distribution in the coherent case. 

As the mutual information for an arbitrary input distribution in the set 7>P eak defined in (|107l) is a lower 
bound on the capacity with a peak power constrained input distribution, in a first step we assume a constant 
modulus (CM) input distribution. I.e., all input symbols have power o 2 x and a uniformly distributed phase. 

Based on ([27]) and (1301) . a lower bound on the mutual information rate with constant modulus input 
symbols is given by 

supX'(y;x) > sup {Z(y;x\ti) + fr'(y|x,h) - h'(y\x)} 

■ppeak ppeak 

> {l(y; x\h) + fc'(y|x, h) - fc'(y|x)} ^ 

r 2 



(«) 



I(y;x\h) 



/= 



a 



log -f S h (f) +l)df 



(127) 



where for (a) we have used (|32l . and the fact that for constant modulus input symbols ft/(y|x) is equal 
to (l45l) . see (l46l) . Furthermore, I(y;x\h)\ CMcr2 corresponds to the coherent mutual information using 
circularly symmetric constant modulus input symbols with power a\. 

Hence, we have found a lower bound on the capacity that is achievable with i.i.d. constant modulus 
input symbols with a uniformly distributed phase. However, as far as we know there is no closed form 
solution for the first term in (11271 ), i.e., T(y; x\h) | 2 , so it has to be calculated numerically. In addition, 
for nominal peak-to-average power ratios /3 > 1 this bound is in general not tight. The lower bound (11271 ) 
in combination with (f84l) is shown in Fig. [3] As it is based on constant modulus signaling, this bound 
becomes loose with an increasing SNR. The lower bound in (11271) corresponds to the lower bound on the 
peak power constrained capacity given in [|2] (34)]. 

Using the well known time- sharing argument, the lower bound on capacity given in (|127l) can be 
enhanced. The time-sharing argumentation is based on the fact that, while keeping the average transmit 
power constant, using the channel only during a fraction of the time might lead to a higher achievable 
rate. Using this time-sharing argument, a lower bound on the peak power constrained capacity for input 
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distributions with an average power a 2 x and a nominal peak-to-average power ratio (3 is consequently given 
by the following expression: 

supZ'(y;x) > max \-l(y;x\h)\ CM -- f' log (^S h (f) + l) df\ . (128) 

This lower bound exactly corresponds to the lower bound on the peak power constrained capacity given 
in [|2l (34)/(29)]|H As the lower bound in (11281 ) does not hold for i.i.d. input symbols due to the application 
of the time-sharing argument, it would be unfair to use it for comparison in Fig. |3] Thus, in Fig. l3l (11271 ) 
is shown. 

Note that in contrast to the lower bound on the achievable rate for i.i.d. zero-mean proper Gaussian 
input distributions in (l82l . the lower bound in (11281 ) does not converge to the coherent capacity for 
asymptotically small channel dynamics, i.e., fd — > 0, as the coherent mutual information rate, which 
equals Z(y;x\h), with any peak power limited input distribution is smaller than the coherent capacity, 
which is achieved for i.i.d. zero-mean proper Gaussian input symbols, cf. (f3TT) . This is also one advantage 
of our study of bounds on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols. As 
the coherent capacity is achieved by this input distribution, this approach allows to give a lower bound 
on the achievable rate, which becomes tight for asymptotically small channel dynamics. 

IV. Alternative Upper Bound on the Achievable Rate with I.I.D. Input Symbols Based 
on the One-Step Channel Prediction Error Variance 

In Section Hn] we have derived bounds on the achievable rate with i.i.d. zero-mean proper Gaussian 
input symbols and also have discussed their link to capacity bounds for peak power constrained input 
symbols given in [2] and [1J. These bounds are based on a purely mathematical derivation and do not 
give any link to a physical interpretation like the channel prediction error variance as it has been used 
in J3J. In the present section, we give a new upper bound on the achievable rate which is based on the 
channel prediction error variance and is also not restricted to peak power constrained input symbols. 
In contrast, for the derivation of the channel prediction based capacity bounds in (3), the peak power 
constraint has been required for technical reasons. However, for this derivation we have to restrict to i.i.d. 
input symbols, which has not been required for the derivation of the upper bound on the achievable rate in 
Section HljPl . As no peak power constraint is required for the derivation of the upper bound in the present 
section, we are able to evaluate the new upper bound also for i.i.d. zero-mean proper Gaussian input 
symbols. Additionally, we will also evaluate the upper bound for peak power constrained input symbols. 
However, due to the required restriction to i.i.d. input symbols, the resulting upper bound with peak power 
constrained input symbols is not an upper bound on the peak power constrained capacity, but only on the 
achievable rate with i.i.d. input symbols and a peak power constraint. In contrast to the upper bound on 
the achievable rate with i.i.d. zero-mean proper Gaussian input symbols given in Section IIII-D21 which 
holds only for a rectangular PSD of the fading process, the upper bound given now holds for an arbitrary 
PSD with compact support. 

In the first part of the following derivation, we only restrict to i.i.d. input symbols with a maximum 
average power (j 2 x . Any other restriction on the input symbols, either to zero-mean proper Gaussian symbols 
or a peak power constraint will be applied later. Thus, we define the set of all i.i.d. input distributions 

8 Note that it would also be possible to enhance the lower bound on the capacity for zero-mean proper Gaussian inputs in (1821 based on 
the time-sharing argument, i.e., by discarding the restriction to identically distributed input symbols. However, as for the derivation of the 
upper bound on the achievable rate in ( 185) we need the restriction to i.d. input symbols, such a lower bound without the assumption on i.d. 
input symbols would not match this upper bound. Therefore, we do not consider this further. 

9 Note, in Section [Til] the upper bound has only been evaluated for i.i.d. zero-mean proper Gaussian input symbols in the final step, a 
restriction to independent input symbols is not required for the derivation itself. 
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with a maximum average power a x as 

N 



Vi.u. = [Kx) x G C N , p(x) = np(ari), p(x f ) = p{xj)1i,j, E[|x fc | 2 ] < ^ VA; j. (129) 
^ t=i J 

A. Achievable Rate based on Channel Prediction 

Corresponding to Section |TTT1 we express the mutual information rate X'(y; x) based on the separation in 
(1271) . As previously stated, we construct an upper bound on the achievable rate based on channel prediction. 
As the channel fading process is stationary and ergodic, and as we assume i.i.d. input symbols, we can 
rewrite h'(y\x) as follows: 

fr'(yl x ) = l r im T7^(y|x) 



JVhhoo TV 



fe=l 
2V 



k=l 



( = } lim /i^lx^yf- 1 ) (130) 

N— >oo 

where, e.g., the vector yf^ 1 contains all channel output symbols from the time instant 1 to the time 
instant N — 1 . Here, for (a) we have used the chain rule for differential entropy, (b) is based on the fact 
that yk conditioned on y^ 1 and x.^ is independent of the symbols x^_ x due to the independency of the 
transmit symbols. Equality (c) follows from the ergodicity and stationarity of the channel fading process 
and the assumption on independent transmit symbols, see [|2~8l Chapter 4.2]. Correspondingly, h'(y) can 
be rewritten as follows: 

h'(y) = Km Hyxly?- 1 ). (131) 

N— >oo 



Thus, based on (11301) and (|131l) . the achievable rate is given by 

J, (y; x ) = i im (M^lyf -1 ) - 

which we name prediction separation of the mutual information rate 



X'(y;x) = lim {h^y?- 1 ) - h{y N \x» , y^ 1 )} (132) 



B. An Upper Bound based on the Channel Prediction Error Variance 

Now, we will upper-bound the achievable rate based on the expression in (I1321 ). 

1) Upper Bound on h'(y): As conditioning reduces entropy, we can upper bound ^(yTvly^ -1 ) in (11311) 

by 

HyMy?- 1 ) < h(y N ). (133) 
Using (11311 ), (11331 ), ergodicity, and stationarity, we get 

h'{y) < h(y N ) < log (ne (aa 2 x a 2 h +<%)) = h! v (y) (134) 

where for (a) we used the fact that proper Gaussian distributions maximize entropy and that the average 
transmit power is given by aa 2 x with a G [0, 1]. Using an average transmit power of aa 2 still enables to 
choose average transmit powers smaller than the maximum average transmit power a 2 . 

Obviously, the upper bound on ^'(y) in (11341 ) is equal to the upper bound in (|36l ), except of the factor 
a, which we introduced here to account for average transmit powers smaller than the maximum average 
transmit power a 2 . This is relevant in case of peak power constrained input symbols, see Section IIiI-Gll 
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2) The Entropy Rate h'(y\x): In the following, we lower-bound ft/(y|x) based on the channel prediction 
representation in (I1301 ). This lower-bounding approach of h'(y\x) is completely different to the one used 
in Section Hll-C2[ Therefore, within the present section we express h(y N \x^, Yi^ 1 ) at the RHS of (11301) 
based on the one-step channel prediction error variance. As the following argumentation will show, the 
channel output conditioned on x^y^ -1 is proper Gaussian and, thus, fully characterized by its 
conditional mean and conditional variance. The conditional mean is given by 



E [x N h N + n^xf ,yf x ] 

x N E [Mxf-Syf- 1 ] 



(135) 



where h N is the MMSE estimate of h N based on the channel output observations at all previous time 
instances and the channel input symbols at these time instances. Based on h N , the channel output yN can 
be written as 



y N = x N h N + n N = x N (h N + e N ^ + n N 



(136) 



with the prediction error given by 



e N 



h N — h 



(137) 



As both, the noise as well as the fading process, are jointly proper Gaussian, the MMSE estimate is 
equivalent to the linear minimum mean squared error (LMMSE). Thus, and hjy are jointly proper 
Gaussian and it follows that the estimation error is zero-mean proper Gaussian. Note that here h N 
also has zero mean. 

As eN is proper Gaussian, it can be easily seen by (11361 ) that y^ conditioned on x^, y^ -1 is also 
proper Gaussian. Thus, for the evaluation of ^(y/vlx^, y^ -1 ), we calculate the conditional variance of 
the channel output y^ which is given by 
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where 
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is the prediction error variance of the MMSE estimator for h N . For (a) we have used the fact that the zero- 
mean estimation error is orthogonal to and, thus, independent of the observations yf -1 . However, the 
prediction error variance depends on the input symbols x^ -1 , which is indicated by writing of (xf -1 ). 



Based on the channel prediction error variance, we can rewrite the entropy ft,(y/v"| x i > yf ) as 
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With (1 1301) and (11401) . we get for i.i.d. input symbols 
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in the past, i.e., 



is the prediction error variance in (11394 for an infinite number of channel observations 
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which is indicated by writing cr^ ^(x^). Note that we have switched the notation and now predict 
at the time instant k instead of predicting at the time instant N. This is possible, as the channel fading 
process is stationary, the input symbols are assumed to be i.i.d., and as we consider an infinitely long past. 

3) Upper Bound on the Achievable Rate: With (11321) . (11331) . (11341) . and (11401) 7 (1 1411) . we can give the 
following upper bound on the achievable rate with i.i.d. input symbols: 



Z'(y;x) <log («p + l)-E Xfc 
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where p is the nominal average SNR, see (fTTI) . Obviously, the upper bound in (11431) still depends on the 
channel prediction error variance cr | iredoo ( x -^) given in (11421) . which itself depends on the distribution of 
the input symbols in the past. Effectively a\ red ^ (x_^) is itself a random quantity. For infinite transmission 
lengths, i.e., N — > oo, its distribution is independent of the time instant k, as the channel fading process 
is stationary and as the transmit symbols are i.i.d.. 

4) The Prediction Error Variance a\ ^J(^-^)' The prediction error variance o 2 ^ (x*"^) in (11421) 
depends on the distribution of the input symbols x^. To construct an upper bound on the RHS of (11431) . 
we need to find a distribution of the transmit symbols in the past, i.e., x fc ^, which leads to a distribution 
,^J-L\ mat max i m j zes the RHS of (11431) . Therefore, we have to express the channel prediction 



error variance erf ^(x^) as a function of the transmit symbols in the past, i.e., x^. In a first step, 
we give such an expression for the case of a finite past time horizon, i.e., for of^xf^ 1 ) in (11391 ) which 



can be expressed by 
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where R n-i, n-i is the correlation matrix of the observations y, 

Yi l x i J 1 

xf _1 are known, i.e., 
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while the past transmit symbols 



Xat_iR/ 1 X^_ 1 + <7^Iat_i 



with Xat_i being a diagonal matrix containing the past transmit symbols such that Xjv-i 
In addition, is the autocorrelation matrix of the channel fading process 
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diag(xf- 1 ). 

(146) 



where 1 contains the fading weights from time instant 1 to N — 1. The cross correlation vector 



hjv | x iv-i between the observation vector 1 and the fading weight h N while knowing the past 



transmit symbols x^ 1 is given by 
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with r fciPred = [r h (—(N — 1)) . . .r h {— 1)] T where r h (l) is the autocorrelation function as defined in ©. 



31 



Spred 



Substituting (11451 ) and (1 1471) into (11441) yields 
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where for (a) we have used Z = X^jXjv-i, Z is a diagonal matrix containing the powers of the 
individual transmit symbols in the past from time instant 1 to N — 1. For ease of notation, we omit the 
index JV - 10 

Remember that we want to derive an upper bound on the achievable rate with i.i.d. input symbols 
by maximizing the RHS of ( 11431) over all i.i.d. distributions of the transmit symbols in the past with 
an average power aa 2 . Obviously, the distribution of the phases of the past transmit symbols xf^ -1 has 



no influence on the channel prediction error variance at (x 

1 t-pred v 



JV-1\ 



Thus, it rests to evaluate, for which 



distribution of the power of the past transmit symbols the RHS of (11431 ) is maximized. In the following, 
we will show that the RHS of (11431) is maximized in case the past transmit symbols have a constant 
power aa 2 . I.e., calculation of the prediction error variance under the assumption that the past transmit 
symbols are constant modulus symbols with transmit power |xfc| 2 = aa 2 maximizes the RHS of (11431) 
over all i.i.d. input distributions for the given average power constraint of aa 2 . 

To prove this statement, we use the fact that the expression in the expectation operation at the RHS of 
(11431) (but here for the case of a finite past time horizon) with (|148l) . i.e., 
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is convex with respect to each individual element of the diagonal of Z, which we name z. The proof of 
convexity of (11491 ) is given in Appendix [D] Based on this convexity and Jensen's inequality, we get 
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where o"f predCM is the channel prediction error variance in case all past transmit symbols are constant 
modulus symbols with power cxa 2 x . Here, the index CM denotes constant modulus. 

As this lower-bounding of (11491 ) can be performed for an arbitrary N, i.e., for an arbitrarily long past, 
we can also conclude that the RHS of (11431) is upper-bounded by 



Z'(y; x)<log(ap+l)-E ; 
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where crf p d cM ^ is the channel prediction error variance in case all past transmit symbols are constant 
modulus symbols with power acr 2 . and an infinitely long past observation horizon. In this case, the 
prediction error variance is no longer a random quantity but is constant for all time instances k. 

l0 Note that the inverse of Z in J148t does not exist, if a diagonal element Zi of the diagonal matrix Z is zero, i.e., one transmit symbol 
has zero power. However, as the prediction error variance is continuous in Zi = for all i this does not lead to problems in the following 
derivation. 
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Constant modulus symbols are in general not the capacity maximizing input distribution. However, we 
only use them to find a distribution of ^(x^) that maximizes (11431) . 

For constant modulus input symbols with power acr 2 and an infinitely long past, the prediction error 
variance is given by, cf. 



crl If 2 . ( ' ever 2 




As far as we know, the upper bound on the achievable rate in (11511) is new. The innovation in the 
derivation of this bound lies in the fact that we separate the input symbols into the one at the time instant 
Xk and the previous input symbols contained in x fc ^. The latter ones are only relevant to calculate the 
prediction error variance, which itself is a random variable depending on the distribution of the past 
transmit symbols. To derive an upper bound on the achievable rate with i.i.d. input distributions, we have 
shown that the achievable rate is upper-bounded if the prediction error variance is calculated under the 
assumption that all past transmit symbols are constant modulus input symbols. As the assumption on 
constant modulus symbols is only used in the context of the prediction error variance, the upper bound 
on the achievable rate still holds for any i.i.d. input distribution with the given average power constraint. 
This allows us to evaluate this bound also for the case of i.i.d. zero-mean proper Gaussian input symbols. 

5) Proper Gaussian Input Symbols: Evaluating (11511) for i.i.d. zero-mean proper Gaussian (PG) input 
symbols yields 

x '{y\ X )| PG - su p { lo s ( a P + !) - / lQ g ( 1 + £pre ^ M '°° a P z I e ~ Zdz 
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= x ^(y; x )l P red,PG ( 153 ) 

where (a) is based on the fact that c r 2 ^ rM ^ monotonically decreases with an increasing a, and, thus, 




that the term in the second line of (|153l) is maximized if the prediction error variance is calculated for 
a — 1, which is denoted by writing er 2 pcdCM ^ | q1 . Furthermore, (b) follows from the monotonicity of 
the argument of the supremum in the third line of (1153t in a, which can be shown analogously to the 
monotonicity of (|88l) based on (|90l ). In conclusion, this means that the upper bound for zero-mean proper 
Gaussian input symbols is maximized for the maximum average transmit power cr 2 . 

As the coherent mutual information rate X'(y;x|h) upper-bounds I'(y;x), we can enhance the upper 
bound in (11531) analogously to (f87l) with X'(x; y|h) given in (I3TI) . 

Fig. ID shows the prediction based upper bound on the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols given in (11531 ) in comparison to the upper and lower bound on the achievable rate 
with i.i.d. zero-mean proper Gaussian inputs given Section IIII-DI for a rectangular PSD of the channel 
fading process. Both upper bounds are shown in combination with the coherent upper bound, i.e., (187b 
and (f3TT ). A comparison of the prediction based upper bound (|153l) /(l87l) and the bound given in (f85l)/(f87T) 
shows, that it depends on the channel parameters which one is tighter. It can easily be shown that for 
and for fd = 0.5 both bounds, i.e., (f85l) and (11531) . are equal. For other fa it depends on the SNR 
p which bound is tighter. An analytical comparison turns out to be difficult as in both cases we use a 
different way of lower-bounding /i'(y|x). 
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Fig. 4. Comparison of the upper bound on the achievable rate with i.i.d. zero-mean proper Gaussian inputs based on channel prediction 
fI33M87l with the upper bound given in d85t/(l87t. in addition the lower bound on the achievable rate with i.i.d. zero-mean proper Gaussian 
inputs d83j/([84jl is shown; rectangular PSD S h (f) 



6) Peak Power Constrained Input Distributions: Now, we consider the case of a peak power constrained 
to Ppeak in addition to the average power constraint. With the nominal peak-to-average power ratio j3 = 
Ppeak/crx> with (11511) we get the following upper bound on the achievable rate with i.i.d. input symbols: 
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where V^f corresponds to Vn± but with the additional peak power constraint \x k \ 2 < (3a 2 . Vf^\a 
corresponds to Vf^ in (11291 ) but with the average transmit power fixed to aa 2 .. Inequality (a) can be shown 
following an analogous argumentation as in Section ITlI-Gll from (|113t to (II 17b . Note that the prediction 

depends on a. Now, we would have to calculate the supremum of the RHS of 



error variance a 
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(11541 ) with respect to a which turns out to be difficult due to the dependency of a, 
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monotonically decreases with an increasing a, and as the RHS of (11541) monotonically increases 
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with 




a 2 
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« opt = min { 1, I ^log I 1 + WM f'° =1 P/g II - - } • (156) 



For (|156l) . we have used the fact that the argument of the supremum in the first line of (|155l) is concave 
in a, and, thus, there exists a unique maximum. 

a) Comparison to Capacity Bounds in [1] and [2]: In the following, we will compare the upper 
bound on the achievable rate with peak power constrained i.i.d. input symbols in (11551) with the upper 
bound on the peak power constrained capacity given in [fl] Prop. 2.2], which is, modified to our notation, 
given by 

C < log (« opt p + 1) - 2£ y * log (pP^- + lj df (157) 
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a opt = min < 1, I ^ / log I + 1 1 # I - - } . (158) 



with 



Note that, in terms of the analytical expression, (11571 ) corresponds to (11201 ) for the special case of a 
rectangular PSD Sh(f), see the discussion in Section Ull-Gll 

On the other hand, we compare the upper bound on the achievable rate with peak power constrained 
i.i.d. input symbols in (11551) with the lower bound on the peak power constrained capacity given in [|2] 
(35)] 



_• 



CM = h(y k \h k ) - log nea 2 n 1 + df (159) 



where k is an arbitrary chosen time instant with an infinitely long past and h(yk\hk) is the differential 
output entropy while conditioning on the channel estimate hk, being given by the MMSE estimate 
E [/ifc|x^, y^] , which is linear due to the fact that this estimation problem is jointly proper Gaussian. 
Based on the time-sharing argumentation, see Section ITII-G21 an enhanced lower bound on the peak power 
constrained capacity is given by flU (29)/(35)] 

C > max -CM). (160) 

7£[l,/3] 7 

b) Numerical Evaluation: Fig. [5] shows the upper bound on the achievable rate with i.i.d. input 
symbols and a peak power constraint based on the channel prediction error variance in (11551 i/(l87li in 
comparison to the upper bound on the peak power constrained capacity given in [1, Prop. 2.2], i.e., (11571) . 
combined with (|87l) . with j3 = 2 for both. For comparison we use the lower bound on the peak power 
constrained capacity given in [|2l (35)], i.e, (11591 ) based on a constant modulus input distribution with 
100 discrete signaling points with a uniform angular spacing. This approximates the case of a uniformly 
distributed phase. This lower bound is shown without time-sharing and with time-sharing (7 op t), see (11601) 
0- Note that the lower bound in (11591 ) is achievable with constant modulus input symbols with a uniformly 
distributed phase. Recall that time-sharing means, that the transmitter uses the channel only a I/7 part 
of the time. Obviously, time-sharing is not in accordance with the assumption on i.i.d. input symbols. 



"Concerning the relation of the lower bound on the peak power constrained capacity in |2| (35)/(29)], i.e., l !160t /( fl"59} , and the one used 
for comparison in Section HII-G2I i.e., d!27t respectively ( 1128b . see (2). 
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Fig. 5. Comparison of the upper bound on the achievable rate with i.i.d. symbols and a peak power constraint given in (1 1 5 5 fc / d87t based 
on channel prediction to the upper bound on the peak power constrained capacity given in [1. Proposition 2.2], i.e., l !157t /l[87t. for /3 = 2; 
in addition, the lower bound on the peak power constrained capacity |2 (35)], i.e., d 1 59b , is shown for a constant modulus (CM) input 
distribution with 100 signaling points, without and with time-sharing d 160t (7 op t); rectangular PSD Sh(f) 



Therefore, the lower bound without time-sharing matches the new upper bound on the achievable rate 
with i.i.d. input symbols in (1155l) /(l87l). while the lower bound with time-sharing (7 opt ) only matches the 
capacity upper bound in 0] Prop. 2.2], i.e., (1157l) /(l87l). From Fig. [5] it can be seen that the upper bound 
on the achievable rate with i.i.d. input symbols in (|155l) /(l87l) is lower or equal than the capacity upper 
bound in [OQ Prop. 2.2], i.e., (|157l) / (|87l) . However, (11551 )/(l87l) is only an upper bound on the achievable rate 
with i.i.d. input symbols and not an upper bound on the capacity, as i.i.d. input symbols are in general 
not capacity achieving, see 0~| and Section MI-Gll This can also be seen, as the lower bound on the 
achievable rate with time-sharing is larger than the upper bound on the achievable rate with i.i.d. input 
symbols (1155l) /(l87l) for very low SNRs. Furthermore, it is worth mentioning that for the case of a nominal 
peak-to-average power ratio (3 = 1, the upper bound in (|1551 ) and the one given in [[U Prop. 2.2], i.e., 
(11571) . coincide. In addition, the prediction based upper bound on the achievable rate in (|155l) as well as 
the capacity upper bound in [I , Prop. 2.2], i.e., (11571) . both become loose for /3 > 1 and high SNR or (3 
very large. 

V. Comparison to Synchronized Detection with a Pilot Based Channel Estimation 

In typical mobile communication systems periodical pilot symbols are introduced into the transmit 
data sequence. The pilot symbol spacing L is chosen such that the channel fading process is sampled at 
least with Nyquist frequency, i.e., L < [1/(2/^) J. Based on these pilot symbols the channel is estimated, 
allowing for a coherent detection (synchronized detection). In conventional receivers, the channel estima- 
tion and the detection/decoding are two separate steps, such that the channel is estimated solely based 
on the pilot symbols. The resulting channel estimation error process is temporally correlated. However, 
performing coherent detection, the information contained in this temporal correlation is discarded. For a 
detailed discussion on this, we refer to [|29l . The channel estimation error leads to an SNR degradation. 
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Fig. 6. Comparison of the bounds on the achievable rate with synchronized detection and a solely pilot based channel estimate (SD) and the 
bounds on the achievable rate with i.i.d. zero-mean proper Gaussian (PG) input symbols given in d85ll/(l87b and J83t/<l84b; the pilot spacing 
L for synchronized detection is chosen such that 1Zl,x V is maximized; rectangular PSD Sh(f) 



Bounds on the achievable rate for this separate processing have been given in 0. For i.i.d. zero-mean 
proper Gaussian data symbols these bounds become 
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where is the channel estimation error variance when estimating the channel solely based on pilot 
symbols which is given by 
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Based on the lower bound in (11611 ) it can easily be seen that the achievable rate is decreased in comparison 
to perfect channel knowledge by two factors. First, symbol time instances that are used for pilot symbols 
are lost for data symbols leading to the pre-log factor £=i, and secondly, the average SNR is decreased 



due to the channel estimation error variance. 



by the factor ^1 - / ^1 + p-^ 

Fig. [6] shows a comparison of the bounds on the achievable rate with synchronized detection based on 
a solely pilot based channel estimate in (11611) and (11621) with the bounds on the achievable rate with i.i.d. 
zero-mean proper Gaussian input symbols given in Section UlI-DI For synchronized detection with a solely 
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pilot based channel estimation the pilot spacing has been chosen such that the lower bound on the achiev- 
able rate in (11611 ) is maximized. As this lower bound is relatively tight, the chosen pilot spacing should be 
close to the one that maximizes the achievable rate with synchronized detection using a solely pilot based 
channel estimation. Obviously, for the practical important range of small channel dynamics, i.e., fa <C 0.1, 
the achievable rate with synchronized detection using a solely pilot based channel estimation stays below 
the achievable rate with i.i.d. zero-mean proper Gaussian input symbols, indicating the possible gain when 
using enhanced receiver structures. Even in case of using pilot symbols, the receiver performance can be 
enhanced by using a joint processing of pilot and data symbols instead of a separate processing. For a more 
detailed discussion on the difference between separate and joint processing we refer to [|29l . In this work 
also a lower bound on the achievable rate with joint processing of pilot and data symbols is given. One 
possibility of such a joint processing is to use an iterative code-aided channel estimation, where the channel 
estimation is enhanced based on reliability information on the data symbols delivered by the decoder. Based 
on this enhanced channel estimation detection and decoding is performed again, see e.g., [fTTl and fl30l . 

VI. Conclusion 

The main focus of the present paper is the study of the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols on stationary Rayleigh flat-fading channels, where it is assumed that the receiver is 
aware of the law of the channel, but does not know its realization. We are interested in the achievable rate 
with i.i.d. zero-mean proper Gaussian input symbols, as this input distribution serves well to upper-bound 
the achievable rate with practical modulation and coding schemes. 

In the first part of this paper, i.e., in Section [Oil we have given a new upper bound on the achievable 
rate for i.i.d. zero-mean proper Gaussian input symbols, which holds in case of a rectangular PSD of the 
channel fading process. Furthermore, we also give a lower bound on the capacity which is achievable 
with i.i.d. zero-mean proper Gaussian input symbols. This lower bound is already known from [16J. With 
the upper and lower bound on the achievable rate for i.i.d. zero-mean proper Gaussian inputs, we have 
found a set of bounds, which is tight in the sense that their difference is bounded. We are able to bound 
this gap analytically by (1 + 2/^)7 with the Euler constant 7 « 0.577[nat/cu]. Thus, for the specific case 
of proper Gaussian inputs we give bounds, which are tight (in the sense given above) over the whole 
SNR range. In contrast, available bounds on capacity often focus only on a specific SNR range, e.g., [Q]| 
discusses the low SNR regime whereas [jH considers the high SNR regime. 

The main novelty in this part of the paper lies in the new upper bound. It is based on a new lower bound 
on the conditional channel output entropy rate h'(y\x) for the special case of a rectangular PSD of the 
channel fading process. This bound is not based on a peak power constraint, and, therefore, allows to give 
an upper bound on the achievable rate with i.i.d. zero-mean proper Gaussian inputs. To the best of our 
knowledge, this is the only known upper bound on the achievable rate without a peak power constraint, 
which is tight in the sense that its slope (pre-log) corresponds to the slope of the lower bound on the 
capacity. However, for the derivation of our upper bound on the achievable rate we need the restriction 
to a rectangular PSD of the channel fading process. 

Furthermore, the comparison of the bounds on the achievable rate with i.i.d. zero-mean proper Gaussian 
input symbols with the asymptotic bounds on the peak power constrained capacity given in [3] shows the 
interesting fact that the achievable rate with i.i.d. zero-mean proper Gaussian inputs is characterized by 
the same asymptotic high SNR slope as the peak power constrained capacity. This shows that this kind 
of input distribution is not highly suboptimal with respect to its high SNR performance. 

Moreover, we have discussed the relation of the bounds on the achievable rate with i.i.d. zero-mean 
proper Gaussian input symbols to known bounds on the capacity with peak power constrained input 
symbols given in [2] and fl}. With respect to this, based on the given lower bound on h'(y\x), we 
have also derived an upper bound on the achievable rate with identically distributed (i.d.) peak power 
constrained input symbols, which is identified to be similar to an upper bound on capacity given in [1 j. The 
assumption on i.d. input symbols is required in the derivation of our lower bound on h'(y\x). However, 
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due to this restriction, with our derivation we are not able to show that the given upper bound on the 
achievable rate is an upper bound on the peak power constrained capacity. Furthermore, our derivation 
is restricted to a rectangular PSD of the channel fading process, whereas the upper bound on capacity 
given in 0]| holds for an arbitrary PSD of the channel fading process. Concerning the lower bounds, the 
difference of the lower bound on the achievable rate with i.i.d. zero-mean proper Gaussian input symbols 
and the lower bound on the peak power constrained capacity given in [2] results mainly from the coherent 
mutual information, which is part of the lower bound in both cases. 

In the second part of the present paper, i.e., in Section |IVl we have derived an alternative upper bound 
on the achievable rate with i.i.d. input symbols based on a prediction separation of the mutual information 
rate. Based on this separation, the conditional channel output entropy rate /i'(y|x) can be expressed by the 
one-step channel prediction error variance, which is a well known result, see, e.g., [0. We show for i.i.d. 
input symbols that the calculation of the prediction error variance CT e pred ^(x-oo) un d er the assumption 
of constant modulus symbols yields an upper bound on the achievable rate. As the constant modulus 
assumption is only used in the context of red 00 ( x -oo)' we can stl ll gi ve upper bounds on the achievable 
rate for general i.i.d. input symbol distributions, even for the case without a peak power constraint. On the 
one hand, we evaluate this upper bound for i.i.d. zero-mean proper Gaussian input symbols. It depends 
on the channel parameters if this upper bound based on the channel prediction error variance given in 
(11531) or the upper bound derived in Section IIII-D2I is tighter. However, the prediction based bound is 
more general as it holds for arbitrary PSDs of the fading process with compact support and is not limited 
to rectangular PSDs as the one given in Section IIII-D2I On the other hand, we have evaluated the upper 
bound on the achievable rate based on the prediction error variance for peak power constrained input 
symbols. In this regard, we have observed that for nominal peak-to-average power ratios of ft — 2 and 
f3 = 1 this upper bound on the achievable rate with i.i.d. input symbols is lower than or equal to the 
capacity upper bound in flU Prop. 2.2] . But, it is not an upper bound on the capacity due to the restriction 
to i.i.d. input symbols. We do not know if this ordering holds in general. 

Finally, in Section |Vj we have compared the bounds on the achievable rate with i.i.d. zero-mean proper 
Gaussian input symbols to bounds on the achievable rate with synchronized detection and a solely pilot 
based channel estimation. This comparison gives an indication of the possible gain when using enhanced 
receivers, e.g., receivers based on iterative code-aided channel estimation. 



Approximation of a Rectangular PSD by an Absolutely Summable Autocorrelation 



In this appendix, we show that the rectangular PSD in ©, whose autocorrelation function is not 
absolutely summable but only square summable, see ©, can be arbitrarily closely approximated by a 
PSD with an absolutely summable autocorrelation function, see ©. 

The discrete-time autocorrelation function r h {l) corresponding to the rectangular PSD S/i(/)L in ©, 
which is given by 



is not absolutely summable. However, the rectangular PSD can be arbitrarily closely approximated by a 
PSD with a shape corresponding to the transfer function of a raised cosine filter, i.e., 



Appendix A 



Function 



r h (l) = alsmc(2f d l) 



(164) 




1/1 < (i - 

(i-A»)/d</<(i + Ao)/d 
/d(l+Ao) < l/l < 0.5 



(165) 
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which for (3 m — > approaches the rectangular PSD <S'/i(/)| Rect - Furthermore, the discrete-time autocorre 
lation function corresponding to Sh(f)\ RC is given by 

cos ((3 m n2f d l) 



r h {l) = a h sinc(2f d l)- 



(166) 



1 " 

which for f3 m > is absolutely summable. Thus, the rectangular PSD in <Q can be arbitrarily closely 
approximated by a PSD with an absolutely summable autocorrelation function. 



Appendix B 

Modified Upper Bound on h'(y) for Gaussian Inputs 

In this appendix, we derive an alternative upper bound on the channel output entropy rate h'(y) for 
the case of i.i.d. zero-mean proper Gaussian input symbols, which is tighter than the one given in (l36l) . 
This derivation is based on work given in [I2T1 . ||3T1 . As its evaluation requires more complex numerical 
methods, we do not further use this bound, but give it for completeness of presentation. 

Obviously, an upper bound on the entropy rate h'(y) is given by assuming an uncorrected channel 
fading process, i.e., its correlation matrix is assumed to be diagonal. This can be easily shown based on 
the chain rule for differential entropy 
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h(y k 



(167) 



where for (a) we have used the fact that conditioning reduces entropy and (b) is due to the ergodicity 
and the stationarity of the channel fading process and the assumption on i.i.d. input symbols. The major 
difference between this upper bound and the upper bound given in (|36l) is that the latter one implicitly 
corresponds to the case that the channel observations y k are proper Gaussian, while the RHS of (11671 ) still 
corresponds to the actual channel output entropy of the individual time instances. The upper bounding in 
( 11671 ) only discards the temporal dependencies between the different observations. 

In the following, we calculate the entropy h(yk) for the case of zero-mean proper Gaussian input 
symbols with an average power al 

HVk) = -Ey» [logOOfc))] 

p(y k \x k )p(xk)dx k \og I / p(yk\x k )p(x k )dx k J dy k 
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h' U2 (y) 



(168) 
(169) 



where 7 w 0.57721 is the Euler constant. To the best of our knowledge, the first integral in (11681 ) 
cannot be calculated analytically. However, it can be evaluated numerically using Hermite polynomials 
and Simpson's rule, see fl2TJ, ll3Tll . fl32l . or by Monte Carlo integration. 
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Fig. 7. Comparison of Av(y),2 w ' m ^ft'(y) 



For the evaluation of the tightness of h' U2 (y), in Fig. [7] the difference 

A h , iy) , 2 = ti U2 (y)-ti L (y) (170) 

is shown in comparison to the difference ^h'(y) given in (|88l ). Obviously, the upper bound h! v (y) is 
tighter than the upper bound h' u (y) given in (l36l) . 



Appendix C 
Sufficient Conditions for 1 in (TTT91) 

In this appendix, we give conditions on the parameters f d , p, and /3 such that a opt = 1 in (II 191 ), i.e., 
the upper bound in (II 171) is maximized by choosing the maximum average power a 2 £ . Therefore, we have 
to evaluate for which parameter choice the following inequality holds: 

The following calculations are closely related to a corresponding problem in [8, Appendix C]. We divide 
the evaluation into the two cases p > 1 and p < 1. 

For p > 1, the RHS of ( 11711) can be lower-bounded by 

9 >\ (172) 



l + p ~ 2 
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yielding the following sufficient condition for (11711 ) to hold: 



Thus, Qj opt = 1 holds if 



1<P< — 



6XP ' 22/- 



eXP( ^ 1 - 1 



1 f3 



(173) 



(174) 



Now, we discuss the case p < 1. Using the inequality ~ log (a; + 1) < ^= for x > 0, for p < 1 the 
LHS of (|171l) can be upper-bounded by 



^Jd 



(175) 



Based on (11751) . inequality (11711) holds if the following sufficient condition is fulfilled: 
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so that we get the second condition 



2/ rf < 



/3 



p + 2 



for p < 1. 



(176) 



(177) 



Thus, if (fl74l) or (fTTTT) is fulfilled, (0191) yields a opt = 1. 

Appendix D 
Convexity of (1149ft 

To prove that (|1491 ) is convex with respect to the individual diagonal elements of Z, we rewrite the 
prediction error variance cr^ ^x^ -1 ) = o"g red (z) as follows: 
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-°h- T h,pred I K-h ~ At /i ^~^2~ + At /l 'J Li h " I r ^,pred 

z . r fl R-l f^V+R-A" 1 Yi ^ , R -A _1 R -1 . 
^ h.pred-^/i I CT 2 + ^/i J CT 2 I CT 2 T rt ft I I h,pred 

1 "I - ^i-^max 

= %e d ( Z V)~ 1 , 7X ( 178 ) 

where for (a) we have used the matrix inversion lemma, and for (b) we have separated the diagonal matrix 
Z as follows: 

Z = Z v + Zi Vi (179) 

where Zw corresponds to Z except that the z-th diagonal element is set to 0, V* is a matrix with all 
elements zero except of the z-th diagonal element being equal to 1, and Zi is the z-th diagonal element of 
the matrix Z. In addition, for (c) we have used the Sherman-Morrison formula and A max is the non-zero 
eigenvalue of the rank one matrix 



B=(-lz v + R-) 'Iv,. 
Furthermore, for (d) we substituted of (z\,) for 



(180) 
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-1 



which is the prediction error variance if the observation at the i-th time instant is not used for channel 
prediction. Additionally, for (d) we have also used the substitution 

h —if —i\ ^ ^* ( — iA ^ —1 

a = r h,pred R -h I —2~ + Rft J "J I -~2~ + R A J Rft ^pred 
V n J n \ n J 

>0 (181) 

where the nonnegativity follows as Vj is positive semidefinite. 

Thus, with (|178l) we have found a separation of the channel prediction error variance Ce pred ( z ) mt0 me 
term a^ d (z\i) being independent of Zi, and an additional term, which depends on Z{. Note that a and 
Amax in the second term on the RHS of (11781) are independent of z^ and that the element i is an arbitrarily 
chosen element. I.e., we can use this separation for each diagonal element of the matrix Z. 

By substituting the RHS of (11781) into (11491) we get 

I 1 2 / 



Recall that we want to show the convexity of (11821) with respect to the element Zj. Therefore, we 
calculate its second derivative with respect to Z{ which is given by 



d 2 K 



\x N \ 2 a2A max (l+z 1 A max ) J , \x N \ 2 f o I- \ _ a ( Zi + 2>t^) 
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and we will show that it is nonnegative, i.e., 

d 2 K 



(d Zi ) 



> 0. (183) 
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Therefore, first we show that A max is nonnegative. This can be done based on the definition of the 
eigenvalues of the matrix B 



Bu = + R h A ^V 4 u = A max u 



I H .. - rr / 1 

■u 

(a) 



w-. * % » -max " i o 



u^V iU = A max ii^ ( — Z v + R,- 1 ) u 



Amax > 

where (a) follows from the fact that the eigenvalues of ^^Z\j + H h 1 J are nonnegative, as R ft is positive 
definite and the diagonal entries of the diagonal matrix Z\j are also nonnegative. In addition, V, is also 
positive semidefinite. 

With A max , Zi, and a being nonnegative, for the proof of (11831 ), it rests to show that 



<>V) " [ « + 777- ] > 0. (184) 



1 

epicdV w 1 + ^jA max V. 2A 

To prove this inequality, we calculate the derivative of the LHS of (11841) with respect to which is given 
by 

a* (z v ) - -A 2 ^1A. K = _ < o (185) 

epredl W l + ^A raax j 2(l + z,A n ~ 

where for the last inequality we have used (I181I ). I.e., the LHS of (11841) monotonically decreases in z^. 
Furthermore, for Zi — > oo the LHS of (11841) becomes 

a ( Z i + St) 1 la) . ... W 




lim <J a e 2 (zO - = I ( = } lim a e 2 (z) > (186) 

^ 6predV W 1 + ^Amax I PICd " 



where (a) follows due to (11781) . and where (b) holds as the prediction error variance must be nonnegative. 
As the LHS of (11841) is monotonically decreasing in Zi and as its limit for z% — > oo is nonnegative, (11841) 
must hold. 

Thus, with (11841 ) inequality (11831 ) holds and, thus, (11821 ) is convex in Zj. As the element % has been 
chosen arbitrarily, in conclusion, we have shown that (1149) is convex in each Zi for % — 1, . . . , ./V — 1. 
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